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GENES IN THE NON-RECOMBINING 
REGION OF THE Y CHROMOSOME 

GOVERNMENT SUPPORT 

The invention described herein was made in whole or in 
5 part with government support under Grant Number HG0 0257 
awarded by the National Institutes of Health. The United 
States Government has certain rights in the invention. 

RELATED APPLICATIONS 

This application claims the benefit of U.S. 

10 Provisional Application No. 60/041,877, filed April 11, 

1997, entitled "Genes in the Non-Recombining Region of the 
y Chromosome" by Bruce T. Lahn and David C. Page. The 
entire teachings of the above referenced application is 
expressly incorporated herein by reference. 

15 BACKGROUND OF THE INVENTION 

The human Y chromosome is distinguished from all other 
nuclear chromosomes by four characteristics: the absence of 
recombination, its presence in males only, its common 
ancestry and persistent meiotic relationship with the X 

2 0 chromosome, and the tendency of its genes to degenerate 

during evolution (J. J. Bull, Evolution of Sex Determining 
Mechanisms (Benjamin Cummings, Menlo Park, CA, 1983); J. A. 
Graves, Annu. Rev. Genet. 30:233 (1996); B. Charlesworth, 
Curr. Biol. 6:149 (1996); W. R. Rice, Bioscience, 46, 331 
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(1996)). To be precise, these distinctive characteristics 
apply only to the non-recombining portion or region of the 
Y chromosome (NRY) , which comprises 95% of the human Y 
chromosome. The remaining 5% of the chromosome is composed 
5 of two pseudoautosomal regions that maintain sequence 

identity with the X chromosome by meiotic recombination (H. 
J. Cooke et al . , Nature 317:687 (1985); M. C. Simmler et 
al., Nature 317:692 (1985); D. Freije et al . , Science 
258:1184 (1992); G. A. Rappold, Hum. Genet. 92:315 (1993)). 
10 Given the NRY's peculiar characteristics, one might expect 
its gene content to be idiosyncratic. Since discovery of 
the Y chromosome in 1923, its gene content has been the 
subject of speculation. By the middle of this century, 
while studies of human pedigrees had identified many traits 
15 exhibiting autosomal or X- linked inheritance, no convincing 
cases of Y- linked inheritance could be found (T. S. 
Painter, J". Exp. Zool . (1923); C. Stern, Am. J. Hum. Genet. 
5:147 (1957) ) As a result, consensus began to emerge that 
the Y chromosome carried few, if any, genes. In 1959, 
2 0 reports of XO females and XXY males established the 
existence of a sex- determining gene on the human Y 
chromosome (P. A. Jacobs et al . Nature 183:302 (1959); C. 
E. Ford et al . , Lancet, i:711 (1959)), but this was 
perceived as a special case on a generally desolate 
25 chromosome. Opinions began to change only during the past 
decade, when eight NRY transcription units (or families of 
closely related transcription units) were identified, most 
during regionally focused, positional cloning experiments 
(D. C. Page et al . , Cell 51:1091 (1987); A. H. Sinclair et 
30 al., Nature 346":240-244 (1990); J. Arnemann et al., 

Genomics 11: 108 (1991); E. C. Salido et al . , Am. J. Hum. 
Genet. 50:303 (1992); E. M. Fisher et al . , Cell 63:1205 
(1990); K. Ma et al . , Cell 75:1287 (1993); A. I. Agulnik et 
al., Hum. Mol. Genet. 3:879 (1994); R. Reijo et al . , Wat. 
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Genet. 10:383.(1995)). It was not known if there were more 
genes in the NRY. 

SUMMARY OF THE INVENTION 

A systematic search of the non-recombining region of 
the human Y chromosome (NRY) has identified 12 novel genes 
or gene families. All 12 novel genes, and six of eight NRY 
genes or families previously isolated by less systematic 
means, fall into two classes. The first class of genes 
exists in one copy and is expressed in many organs; they 
have functional X homologs that escape X inactivation, as 
predicted for genes involved in Turner (XO) syndrome. The 
second class consists of Y- chromosomal gene families 
expressed specifically in testes, and may account for 
infertility among men with Y deletions. 

The genes described herein, portions of the genes and 
DNA which hybridizes to genes or gene portions described 
are useful in diagnostic methods, such as a method to 
identify individuals in whom all or a portion of a gene or 
genes Q f the NRY is missing or altered. For example, Y 
chromosomal DNA from males with a known condition, such as 
infertility or reduced sperm count, can be assessed, using 
the gene(s) described herein, or characteristic portions 
thereof, to determine whether their DNA lacks some or all 
of the gene(s) described herein or contains an altered 
gene(s) (e.g., a gene in which there is a deletion, 
substitution, addition or mutation, compared to the 
sequences presented herein). Y chromosomal DNA (e.g., from 
a male with reduced sperm count or viability) can be 
assessed, using DNA described herein or DNA which 
hybridizes to DNA described herein, to determine whether 
the condition is associated with or caused by the 
occurrence of the gene or the gene alteration. For 
example, the presence or absence of all or a portion of a 
gene or genes shown to be necessary for fertility or 
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adequate sperm count can be assessed, using DNA which 
hybridizes to the gene or genes of interest to determine 
the basis for their infertility or reduced sperm count. In 
one embodiment, the occurrence of one or more Y- specific 
5 genes or a characteristic portion of one or more Y-specific 
genes is assessed in Y chromosomal DNA. In another 
embodiment, deletion or alteration of one of the testis- 
specific (Y-specific) genes described is assessed, such as 
by a hybridization method in which DNA which hybridizes to 

10 one of the Y-specific genes described herein or a 

characteristic portion thereof is used to assess a DNA 
sample obtained from a male who has a reduced sperm count. 
Lack of hybridization of the Y-specific DNA used to DNA in 
the sample indicates that the gene is not present in sample 

15 DNA or is present in an altered form which does not 

hybridize to Y-specific DNA of the present invention. In 
another embodiment, an X-homologous gene or genes present 
on the NRY can be used to determine whether the gene is 
present in an individual or if it occurs in an altered form 

2 0 in the individual. Using known methods, such as 

hybridization methods, X or Y chromosomal DNA from an 
individual can be assessed for the presence or absence of 
one or more of the X-homologous genes or a characteristic 
portion of one or more X-homologous genes. X or Y 

2 5 chromosomal DNA can also be assessed for the presence or 

absence of an altered form of one or more of the X- 
homologous genes described. In the present methods, DNA 
can be analyzed for the occurrence of Y-specific DNA, X- 
homologous genes or both. For example, a "battery" or 

3 0 group of DNA probes (sequences) can be used to analyze 

sample DNA; the probes can include Y-specific DNA probes 
(e.g., DNA which hybridizes to a Y-specific gene), X- 
homologous gene probes (e.g., DNA which hybridizes to an X- 
homologous gene) or both types of probes. DNA described 
3 5 herein is also useful as primers in an amplification 
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method, such as PCR, useful for identifying and amplifying 
Y-specific DNA or X-homologous genes in a sample (e.g., Y 
chromosomal DNA) . Further, proteins or peptides encoded by 
the DNA described herein, such as proteins or peptides 
5 encoded by an X-homologous gene or proteins or peptides 
encoded by testis-specif ic DNA (a testis-specif ic gene) , 
can be assessed in samples. This can be carried out, for 
example, using antibodies which recognize proteins or 
peptides of the present invention (proteins or peptides 
10 encoded by DNA described herein) . 



BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 is a gene map of the non-recombining region 
of the Y chromosome. 

Figure 2 shows the amino acid sequence alignments of 
15 the chromodomain (SEQ ID NO. : 1-6) and putative catalytic 
domain (SEQ ID NO.: 7-12) of human CDY genes with their 
respective homologs . Amino acid identities are indicated 
by black shading and for each protein, the first and last 
amino acid residues are numbered (with respect to the 
2 0 initiator methionine) and the total length of the protein 
is indicated. Chromodomain: SEQ ID NO. : 1, CDY (human); 
SEQ ID NO. : 2, HP1 (Drosophila) ; SEQ ID NO.: 3, Polycomb 
(Drosophila) ; SEQ ID NO.: 4, CHD1 (Drosophila); SEQ ID NO.: 
5, Su(var) 3-9 (Drosophila; SEQ ID NO.: 6, PDD1 
25 (Tetrahymena) ; SEQ ID NO. : 7; Covalent modification domain: 
SEQ ID NO.: 8, CDY (human); SEQ ID NO.: 9, Enoyl-CoA 
Hydratase (Human); SEQ ID NO. : 10, 4-CBA-CoA dehalogenase 
(Arthrobacter) ; SEQ ID NO. : 11, Crotonase (C. 
acetobutylicum) ; SEQ ID NO . : 12, Naphthoate synthase (E. 
30 coli) . 

Figures 3A and 3B are the nucleic acid sequence of DBX 
(long and short transcripts, SEQ ID NO: 13 and SEQ ID NO: 
14, respectively) and the encoded amino acid sequences (SEQ 
ID NO: 15 and SEQ ID NO.: 16, respectively), DBY (SEQ ID 
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NO: 17) and the encoded amino acid sequence (SEQ ID NO: 
18) . Dots in the DBX DNA and protein sequences indicate 
that the nucleic acids or amino acid residues are the same 
as those represented for DBY; dashes indicate a missing 
5 nucleic acid or amino acid residue. 

Figures 4A and 4B present the nucleic acid sequences 
for three forms of TPRY (short, medium and long, SEQ ID NO: 
19, SEQ ID NO: 2 0 and SEQ ID NO: 21, respectively) and the 
encoded amino acid sequences for the short, medium and long 
10 forms (SEQ ID NO: 22, SEQ ID NO.: 23 and SEQ ID NO: 24, 
respectively) . 

Figure 5 presents the nucleic acid sequences of TB4X 
(SEQ ID NO: 25) and TB4Y (SEQ ID NO : 26) and the encoded 
amino acid sequences (SEQ ID NO: 2 7 and SEQ ID NO: 28, 
15 respectively) . Dots in the TB4X DNA and protein sequences 
indicate that the nucleic acids or amino acid residues are 
the same as those represented for TB4Y. 

Figure 6 represents the nucleic acid sequences of 
EIF1AX (SEQ ID NO: 29) and EIF1AY (SEQ ID NO: 30) and the 
20 encoded amino acid sequences (SEQ ID NO: 31 and SEQ ID NO: 
32, respectively) . 

Figures 7A - 7D represent the nucleic acid sequences 
of DFFRX (SEQ ID NO: 33) and DFFRY (SEQ ID NO: 34) and the 
encoded amino acid sequences (SEQ ID NO: 35 and SEQ ID NO: 
25 36, respectively). 

Figure 8 represents the nucleic acid sequences of CDYa 
(SEQ ID NO: 37) and CDYb (SEQ ID NO: 38) and the encoded 
amino acid sequences (SEQ ID NO: 3 9 and SEQ ID NO: 40, 
respectively) . 

30 Figure 9 represents the nucleic acid sequences of BPY1 

(SEQ ID NO: 41) and the encoded amino acid sequence (SEQ ID 
NO: 42) . 

Figure 10 represents the nucleic acid sequence of BPY2 
(SEQ ID NO: 43) and the encoded amino acid sequence (SEQ ID 
35 NO: 44) . 
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Figure 11 represents the nucleic acid sequences of 
XKRY (SEQ ID NO: 45) and the encoded amino acid sequence 

(SEQ ID NO: 46) . 

Figure 12 represents the nucleic acid sequences of 
5 PTPRY (SEQ ID NO: 47) and the encoded amino acid sequence 

(SEQ ID NO: 48) . 

Figure 13 is the nucleic acid sequence of TTY1 (SEQ ID 

NO: 49) . 

Figure 14 is the nucleic acid sequence of TTY2 (SEQ ID 
10 NO: 50) . 

Figure 15 shows the nucleic acid sequence of the human 
CDY Like (CDYL) gene, which is the human autosomal homolog 
of CDY, located on chromosome 6p and expressed 
ubiquitously . 

15 Figure 16 shows the nucleic acid sequence of the mouse 

Cdyl (CDY like) gene, which is the mouse ortholog of human 
CDYL, located on chromosome 13 and expressed predominantly 
in the testis. A longer transcript of the gene is 
ubiquitously expressed. 

20 Figures 17A - 17C show the nucleic acid sequences of 

human Variably Charged Protein family members VCP2r, VCPBr 
and VCPlOr, which are expressed in the testis and highly 
polymorphic . 

Figure 17A is the nucleic acid sequence of VCP2r. 
25 Figure 17B is the nucleic acid sequence of VCP8r. 

Figure 17C is the nucleic acid sequence of VCPlOr. 

DETAILED DESCRIPTION OF THE INVENTION 

Y chromosome genes, classed as genes having X 
homologues and testis-specif ic (Y-specific) genes, are the 
30 subject of the invention described herein, as are DNA which 
hybridize to (are complementary to) all or characteristic 
portions of the Y chromosome genes, the encoded products 
(e.g., proteins, peptides, glycoproteins), antibodies and 
methods of diagnosis or treatment in which the genes, 
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10 



15 



complementary DNA, encoded proteins or antibodies are used. 
As described herein, fragments that hybridized to Y 
chromosomal DNA were selected and then their nucleotide 
sequences determined. It was expected that these sequence 
fragments would represent a redundant sampling of a much 
smaller set of genes. Computer analysis revealed that 577 
fragments corresponded to known Y genes, including seven of 
eight NRY genes and all eight pseudoautosomal genes 
previously reported. These findings suggested that the 
2 53 9 sequence fragments represented the great majority of 
all Y-chromosomal genes. After further analysis, both to 
eliminate human repetitive sequences and to assemble 
overlapping fragments into contigs, 912 novel and 
non-overlapping sequences were hybridized to Southern blots 
of human genomic DNAs . 308 sequences that detected at 
least one prominent male-specific fragment were judged 
likely to derive from the NRY, and for each work was 
carried out to isolate cDNA clones from a human testis 
library, as described in Example 1. Nucleotide sequencing 
of cDNA clones, and rescreening of libraries as necessary, 
yielded full-length cDNA sequences for ten novel NRY genes 
or families, and partial cDNA sequences for two additional 
ones (Table and Figures 1 - 14) . 
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All 12 novel genes were localized on the Y chromosome, 
as described in Example 2. Figure 1 is a gene map of NRY. 
As shown, the Y chromosome consists of a large 
non-recombining region (NRY; euchromatin plus 
heterochromatin) flanked by pseudoautosomal regions (pter, 
short arm telomere; qter, long arm telomere) . The NRY is 
divided into 43 ordered intervals (1A1A through 7) which 
are defined by naturally occurring deletions (D. Vollrath, 
et al., Science 258:52 (1992)). Listed immediately above 
the Y chromosome in Figure 1 are nine NRY genes with 
functional X homologs; novel genes are boxed. Indicated 
immediately below the Y chromosome are 11 testis-specif ic 
genes or families, some with multiple locations. It is 
likely that some testis-specif ic families have members in 
additional deletion intervals; the locations indicated are 
representative, but are not necessarily exhaustive. At the 
bottom of Figure 1 are shown NRY regions implicated, by 
deletion mapping, in sex determination, germ cell 
tumorigenesis (gonadoblastoma) , stature, and spermatogenic 
failure (K. Ma et al . , Cell 75:1287 (1993); R. Reijo et 
al., Nat. Genet. 10:383 (1995); P. H. Vogt et al . , Hum. 
Mol. Genet. 5:933 (1996); J. L. Pryor et al . , New England 
J. Med. 336:534 (1997); K. Tsuchiya et al . , Am. J. Hum. 
Genet. 57:1400 (1995); P. Salo et al . , Hum. Genet. 95:283 
(1995)). Euchromatic regions that are made up, at least 
partially, of Y- specific repeats are drawn in grey. AMELY, 
which appears to fall within such a repeat -containing 
region, is actually located in a sub-region of 4A that is 
not repetitive. 

Expression of the 12 novel genes was assessed in 
diverse human tissues, by Northern blotting. 
Autoradiograms were produced by hybridizing 32 P-labeled 
cDNA probes to Northern blots of poly (A) + RNAs (2 /^g/lane) 
from human tissues (Clontech, Palo Alto, CA) . Probes 
employed were cDNA clones, full-length (most genes) or 
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partial (DBY, nucleotides 1476-2319 of GenBank AF000985; 
TPRY, nucleotides 861-1768 of GenBank AF000996; DFFRY, 
nucleotides 8604-9878 of GenBank AF000986) . Blots were 
hybridized at 65°C in Church's buffer (0.5 M NaiPO, at 
5 pH7.5, with 7% SDS) , and washed at 65°C in IX SSC and 0.1% 
SDS. DBY, TB4Y, EIF1AY and DFFRY probes cross -hybridize to 
transcripts derived from their X homologs . For all five 
X-homologous genes (DBY, TPRY, TB4Y, EIF1AY and DFFRY) , 
expression was tested and confirmed in three male tissues 
10 (brain, prostate and testis) by RT-PCR using Y-specific 
primers - 

The novel genes encode an assortment of proteins and 
are dispersed throughout the euchromatic portions of the 
NRY . Nonetheless, all 12 genes fall into two discrete 

15 classes: 1) X-homologous genes and 2) testis-specif ic , 
Y-specific gene families (Table) . 

The X-homologous genes share the following 
characteristics: each has a homolog on the X chromosome 
encoding an extremely similar but nonidentical protein 

20 isoform, each is expressed in a wide range of human tissues 
(is not testis-specif ic) , and each appears to exist in a 
single copy on the NRY. There are five novel 
representatives of this X-homologous class: 

1. DBY encodes a novel "DEAD box" protein, perhaps an RNA 
25 helicase involved in translation initiation (P. Linder, et 
al.. Nature, 337, 121 (1989); R.-Y, Chuang, P. L. Weaver, 
Z. Liu, T.-H. Chang, Science, 275, 1468 (1997)). The DBY 
protein is 91% identical to DBX, encoded by a homologous 
gene on the human X chromosome. 
30 2. TPRY encodes a novel protein containing 10 tandem "TPR" 
motifs, a protein-protein interaction domain found in the 
products of the yeast SSN6/CYC8, CDC16, and CDC23 genes, 
among others (R. S. Sikorski, M. S. Boguski, M. Goebl , P.. 
Hieter, Cell, 60, 307 (1990); D. Tzamarias, K. Struhl , 
35 Genes Dev, 9, 821 (1995)). Differential splicing may 
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generate TPRY isoforms that differ at their carboxy 
termini. The amino terminal portion of the TPRY protein is 
83% identical to TPRX, encoded by an homologous gene on the 
X chromosome . 

5 3 • TB4Y encodes a 44 amino acid protein that differs at 
only three residues from thymosin S 4 , which functions in 
actin sequestration (H. Gondo, et al . , J. Immunol. 233:3840 
(1987) ; D. Safer, M. Elzinga, V. T. Nachmias, J Biol Chem, 
266, 4029 (1991)), and we found is located on the X. It is 
10 proposed that the X-linked gene encoding thymosin & 4 be 
called TB4X. 

4. EIF1AY encodes a Y-linked isoform of translation 
initiation factor 1A (elF-lA) (T. E. Dever, et al . , J Biol 
Chem, 269, 3212 (1994) ; J. W. Hershey, Annu. J?ev. Biochem. 

15 60, 111 (1991) ) , which we discovered is located on the X. 
It is proposed that the X-linked gene encoding elF-lA be 
called EIF1AX. The amino acid sequences of the X and 
Y-encoded proteins are 97% identical. 

5. DFFRY encodes a Y-linked isoform of DFFRX, a recently 
20 described X-linked protein. A Y-linked homolog was 

detected previously, but had been thought to be a 
pseudogene. The human DFFRX and DFFRY proteins, which are 
91% identical, are homologous to the Drosophila fat-facets 
gene product, a deubiquinating enzyme required for eye 

25 development and oogenesis (M. H. Jones, et al . , Hum Mol 

Genet 5, 1695 (1996) ; J. A. Fischer-Vize , G. M. Rubin, R. 
Lehmann, Development, 116, 985 (1992) ; Y. Huang, R. T. 
Baker, J. A. Fischer-Vize, Science, 270, 1828 (1995)). 
The second group of novel NRY genes, the testis- 

3 0 specific, Y-specific gene families, share a very different 
set of characteristics: each appears to be expressed 
specifically in testes and each appears to exist in 
multiple copies on the NRY, as judged by I) the number and 
intensity of hybridizing fragments on genomic Southern 

35 blots or ii) multiple map locations on the Y. We report 
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five novel testis-specif ic , Y-specific gene families with 
full-length cDNA sequences: 

1. The CDY family encodes proteins with an amino- terminal 
"chromodomain, " a chromatin binding motif (T. C. James, S. 

5 c. Elgin, Mol Cell Biol, 6, 3862 (1986); B. Tschiersch, et 
al., EMBO J, 13, 3822 (1994); R. Paro, D. S. Hogness, Proc 
Natl Acad Sci USA, 88, 263 (1991); D. G. Stokes, K. D. 
Tartof , R. P. Perry, Proc Natl Acad Sci USA, 93, 7137 
(1996); M. T. Madireddi, et al . , Cell, 87, 75 (1996)) 

10 (Figure 3) . The carboxy- terminal half shows striking amino 
acid similarity, over a region of more than 2 00 residues, 
to nearly the full length of several enzymes, both 
prokaryotic and eukaryotic (M. Kanazawa, et al . , Enzyme 
Protein, 47, 9 (1993); A. Schmitz, K. H. Gartemann, J. 

15 Fiedler, E. Grund, R. Eichenlaub, Appl . Environ. Microbiol. 
258, 4068 (1992); Z. L. Boynton, G. N. Bennet, F. B. 
Rudolph, J Bacterid, 178, 3015 (1996); V. Sharma, K. 
Suvarna, R. Meganathan, M. E. Hudspeth, J Bacterid, 174, 
5057 (1992); P. M. Palosaari, et al . , J Biol Chem, 266, 

20 10750 (1991)). The reactions catalyzed by these homologs 
are diverse, but in each case the substrate contains 
cof actor A (CoA) attached to a carbonyl group, and an 
alkoxide intermediate is formed. The unprecedented 
combination of a chromodomain and a putative CoA-substrate 

25 enzyme in a single polypeptide suggests that, in vivo, CDY 
proteins may catalyze covalent modification of DNA or 
chromosomal proteins, perhaps during spermatogenesis. 

2. The BPY1 genes encode a basic protein, 125 residues 
long, with little sequence similarity to known proteins. 

3 0 The encoded protein is rich in serine, lysine, arginine, 

and proline and has a pi of 9.4. Southern blotting studies 
revealed homologous sequences on the human X chromosome, 
but screening of cDNA libraries has failed to yield 
X-derived clones . 
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3. The BPY2 genes encode a second basic protein, 106 
residues in length, without obvious sequence similarity to 
BPY1 or other known proteins. The pi of BPY2 is 10.0. 

4 . The XKRY genes encode a protein with sequence 

5 similarity to XK, a putative membrane transport protein 
defective in McLeod syndrome (M. Ho, et al., Cell, 77, 869 

(1994) ) . 

5 . The PTPRY genes encode a protein with weak homology to 
a putative protein- tyrosine phosphatase (PTPase) in the 
10 mouse (W. Hendriks, et al . , J Cell Biochem, 59, 418 

(1995) ). Two additional families of testis-specif ic 
transcription units, referred to as TTY1 and TTY2, have 
been identified. The sequences represented in Figures 14 
and 15 are being assessed for open reading frames. 

15 It appears that conventional single-copy genes, 

commonplace elsewhere in the genome, are quite uncommon in 
the NRY. Indeed, the two classes of NRY genes suggested by 
the systematic search described herein accommodate not only 
the 12 genes reported here, but also six of eight 

20 previously identified NRY genes. SRY, a Y-specific gene 
that triggers the male pathway of sexual differentiation, 
is expressed in testes, and exists in only one copy in the 
NRY. AMELY, which has an X- linked homolog AMELX, is 
expressed only in the developing tooth bud. The X 

25 inactivation status of AMELX is unknown. 

Also described herein are five additional genes and 
their sequences (Figures 15, 16, 17A - 17C) : human CDY 
Like (CDYL) , which is the human homolog of CDY; it is on 
chromosome 6p and expressed ubiquitously; mouse Cdyl (CDY 

3 0 like) , which is the mouse ortholog of human CDYL; it is on 
chromosome 13 and expressed predominantly in testis and 
also has a longer transcript that is expressed 
ubiquitously; and human VCP (Variably Charged Protein) 
family, which is a family of genes on the X chromosome that 

35 are homologous to BPYI, expressed in the testis and highly 
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polymorphic. Human CDY, human CDYL and mouse Cdyl have 
been shown to be histone acetyltransf erases by in vitro 
assays . Human CDY is a candidate for the Azoospermia 
Factor (AZF) because it is within the AZFc region that is 
commonly deleted in infertile men. Chemicals that block 
the enzymatic activity of any of these genes are candidate 
mal£ cont racept ives . 

Inhibitors of the enzymatic activity of these genes, 
such as the human CDY gene, can be identified through an in 
vitro assay. For example, the protein encoded by one of 
the genes (e.g., CDY-encoded protein) can be produced, such 
as by recombinant means (e.g., in bacterial cells 
containing a vector or plasmid which includes the gene to 
be expressed) , and obtained. The effect of a candidate 
inibitor (drug) on the enzymatic activity of the protein 
can be assessed by combining the candidate inhibitor with 
the protein, a substrate of its enzymatic activity (e.g., 
histones) acetyl CoA (e.g., radiolabelled acetyl CoA) and 
other assay components (e.g., an appropriate physiological 
solution or buffer) , to produce a combination. The 
combination is maintained under conditions under which the 
enzymatic activity of the protein is maintained and 
appropriate for the protein to act upon/interact with its 
substrate (e.g., for the CDY gene to retain its histone 
acetyltransf erase activity) . As a result, the substrate is 
acted upon by the protein if the candidate inhibitor does 
not inhibit the protein and the protein acts upon the 
substrate. If the substrate is not acted upon by the 
protein, this is an indication that the candidate inhibitor 
is an inhibitor of the protein. For example, if a histone 
acetyltransf erase, such as CDY-encoded protein is inhibited 
by a candidate inhibitor, its histone acetyltransf erase 
activity will be blocked. If radiolabelled acetyl CoA is 
used, transfer of the radiolabelled acetyl group to the 
enzyme substrate (histones) is inhibited (will not occur or 
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will occur to a lesser extent than occurs in the absence of 
the candidate inhibitor) . Whether transfer occurs can be 
assessed by determining the location of xadiolabelled 
acetyl groups from acetyl CoA. If the histone substrates 
5 are not radiolabelled or are radiolabelled to a lesser 
extent in the presence of a candidate inhibitor (than in 
its absence) , the candidate inhibitor is an inhibitor of 
the protein. Inhibitors identified in this way can be 
further assessed in additional in vitro assays or in in 

10 vivo assays (e.g., in an appropriate animal model) . 

To interpret the observation that these X-homologous 
and multi-copy, testis -specif ic groups account for 18 of 20 
known NRY genes or families, we postulate that the NRY's 
evolution was dominated by two strategies. The first 

15 strategy favors conservation of certain existing genes and 
the second favors the acquisition of a class of novel 
genes: 1) The X-homologous genes probably reflect the 
common ancestry of the X and Y chromosomes, and selective 
pressures to maintain comparable expression of genes in 

20 males and females. 2) The abundance of testis-specif ic 
families may have resulted from the NRY's selectively 
retaining and amplifying genes that enhance male 
reproductive fitness . 

1) Dosage compensation and X-Y homology. Experts 

25 agree that the mammalian X and Y chromosomes evolved from 
autosomes, with nearly all ancestral gene functions 
deteriorating on the non-recombining portion of the 
emerging Y chromosome while being maintained on the nascent 
X chromosome (J. J. Bull, Evolution of Sex Determining 

3 0 Mechanisms (Benjamin Cummings, Menlo Park, CA, 1983); J. A. 
Graves, Annu. Rev. Genet. 30:233 (1996); B. Charlesworth, 
Curr. Biol. 6.-149 (1996); W. R. Rice, Bioscience 46:331 
(1996)) . Functional degeneration of the NRY would result 
in females having two, but males only one, copy of many 

35 genes, creating the need for a mechanism to equalize 
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X-linked gene expression in the sexes. In mammals, a 
predominant solution to this problem is provided by X 
inactivation, the transcriptional silencing of one X 
chromosome in females . 
5 However, the findings on X- homologous NRY genes 

described herein, combined with previous studies, 
illustrate the importance in human evolution of an 
alternative solution: preservation of homologous genes on 
both the NRY and the X chromosome, with both male and 

10 female cells expressing two copies of such genes, A 

critical prediction of this model is that, in female cells, 
the X homologs should escape X inactivation. This is the 
case for all widely expressed X-linked genes with known NRY 
homologs, including the X homologs of five novel NRY genes 

15 reported here (E . M. Fisher, et al . , Cell 63:1205 (1990); 
A. I. Agulniket al . , Hum. Mol . Genet. 3.-879 (1994); M. H. 
Jones et al . , Hum. Mol. Genet. 5:1695 (1996); J. A. 
Fischer-Vize et al., Development 116:985 (1992); Y. Huang 
et al., Science 270:1828 (1995); A. Schneider-Gadicke et 

20 al., Cell 57:1211 (1989)). A second prediction of this 

model is that the human X and Y encoded proteins should be 
functionally interchangeable even though the nucleotide 
sequences of their corresponding genes are considerably 
diverged. Indeed, each of the eight known X-NRY gene pairs 

25 encode closely related isoforms, with 83 to 97% amino acid 
identity throughout their lengths; functional 
interchangeability has been demonstrated in the one case 
tested to date (M. Watanabe et al . , Nat. Genet. 4:268 
(1993) ) . 

30 Turner syndrome is classically associated with an XO 

sex chromosome constitution. In 1965, Ferguson- Smith 
postulated that the Turner phenotype might be due to 
inadequate expression of X-Y common genes that escape X 
inactivation (M. A. Ferguson-Smith, J. Med. Genet. 2;142 

35 (1965) ) . These "Turner genes" have yet to be identified 
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with certainty. However, there now exists a substantial 
collection of X- homologous NRY genes (Figure 1) which can 
be assessed for genes which contribute to or are 
responsible for the Turner phenotype . The potential role 
5 of RPS4Y and RPS4X in Turner syndrome is controversial (E. 
M. Fisher et al . , Cell 63:1205 (1990); W. Just et al . , Hum. 
Genet. 89:240 (1992)). At least one Turner gene maps to 
the Xp-Yp pseudoautosomal region (T. Ogata et al . , J. Med. 
Genet. 30:918 (1993)). Seven of the eight known X-NRY gene 

10 pairs appear to be ubiquitously expressed, and at least 

three encode housekeeping proteins: an essential ribosomal 
protein (RPS4) , an essential translation initiation factor 
(elF-lA) , and a modulator of actin polymerization (thymosin 
&4) . Perhaps some features of the XO phenotype (e.g., poor 

15 fetal viability) reflect inadequate expression of such 
housekeeping functions. 

2) Male fitness and Y-specific, testis-specif ic 
genes. As first appreciated by R.A. Fisher, animal genomes 
may contain genes or alleles that enhance male reproductive 

2 0 fitness but are inconsequential or detrimental with respect 

to female fitness (R. A. Fisher, Biol. Rev. 6:345 (1931)). 
As Fisher recognized, selective pressures would tend to 
favor the accumulation of such genes in male-specific 
regions of genomes. Of course, male reproductive fitness 
25 depends critically on sperm production, the central task of 
the adult testis. Since the NRY is the only male-specific 
portion of the mammalian genome, it should have a unique 
tendency to accumulate male-benefit genes during evolution. 
These principles are illustrated by several gene 

3 0 families on the human NRY. De novo deletions of the DAZ 

gene cluster on the human Y chromosome are associated with 
severe spermatogenic defects (R. Reijo et al . , Nat. Genet. 
10:383 (1995)), and in Drosophila the DAZ homolog Jboule is 
required for spermatogenesis (C. G. Eberhart et al., Nature 
35 381:183 (1996)). The DAZ gene cluster on the human Y 
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chromosorae arose, during primate evolution, by 
transposition and amplification of an autosomal gene. 
Likewise, two other testis-specif ic NRY gene families 
— YREM and TSPY — may also be the result of the Y 
5 chromosome's having acquired and amplified autosomal genes 
(R. Saxena et al . , Nat. Genet. 14:292 (1996); M. L. 
Delbridge et al., Nat. Genet. 15:131 (1997) ) . It is 
possible that the selective advantage conferred by the 
NRY's retaining and amplifying male fertility factors (from 

10 throughout the genome) accounts for the multitude of 

testis-specif ic gene families there. This may have been 
the preeminent force in shaping the NRY's gene repertoire, 
as it appears that the great majority of NRY transcription 
units are members of such testis-specif ic families. In the 

15 NRY, each of the testis-specif ic gene families has multiple 
members, 20 to 4 0 copies in the case of TSPY (E. Manz et 
al., Genomics 17: 726 (1993)), and perhaps as many as 20 
copies in the case of YRRM (K. Ma et al . , Cell 75:1281 
(1993)). All together, the various Y-specific gene 

20 families may include as many as several hundred genes or 
copies. Though it is not known how many of these are 
functional, it seems likely that Y-specific, 
testis-specif ic gene families comprise the great majority 
of NRY transcription units. 

25 Recent genetic studies underscore the importance of 

the human Y chromosome in fertility. Many men with 
spermatogenic failure, but who are otherwise healthy, have 
deletions of portions of the NRY (K. Ma et al . , Cell 75: 
1287 (1993); R. Reijo et al . , Nat. Genet. 10:383 (1995); P. 

30 H. Vogt et al . , Hum. Mol . Genet. 5:933 (1996); J. L. Pryor 
et al., New England J. Med. 336:534 (1997)). These 
findings suggested the existence of NRY genes that play 
critical roles in male germ cell development but are not 
required elsewhere in the body. Previous deletion mapping 

35 studies have implicated four regions of the NRY in either 
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spermatogenic failure or germ cell tumorigenesis, and in 
each of these four regions we now report novel candidate 
genes expressed specifically, or most abundantly, in testes 
(Figure 1) . As shown in Figure l, the region implicated in 
5 gonadoblastoma, stature and spermatogenic failure all 
contain novel candidate genes. Two of the three regions 
implicated in spermatogenic failure each contain one or 
more novel testis-specif ic genes. The third region 
implicated in spermatogenic failure (intervals 5B-5D) 
10 contains two X -homologous genes, DBY and EIF1AY, with 
abundant, test is -specif ic transcripts in addition to 
higher-molecular-weight , ubiquitous transcripts . 

While X-homologous and testis -specif ic genes are 
somewhat intermingled within the NRY, clustering is evident 
15 (Figure 1) . The geographic distribution of the two classes 
correlates quite well with previously identified sequence 
domains within the euchromatic NRY (D. Vollrath et al . , 
Science 258:52 (1992); S. Foote et al . , Science 258:60 
(1992)). Ten of the 11 known testis-specif ic families map 
20 to previously identified regions of Y-specific repetitive 
sequences. The only exception is BPY1, which 
cross -hybridizes to the X chromosome and maps to a 
previously recognized region of X homology. Indeed, one or 
more testis-specif ic gene families are found in nearly all 
25 known regions of euchromatic Y repeats (Figure 1) . 

Ironically, it had been widely assumed that these regions 
consisted of "junk" DNA, partly on theoretical grounds (B. 
Charlesworth, Science 251:1030 (1991); E. Seboun et al . , 
Cold Spring Ha.rb. Symp. Quant. Biol. 1.-237 (1986)). To the 
30 contrary, the results presented here argue that these 

Y-specific repetitive regions contain the great majority of 
the NRY's transcription units (The only exception is BPY1, 
which cross -hybridizes to the X chromosome and maps to a 
previously recognized region of X homology) . These regions 
3 5 may be the result of rampant gene amplification during 
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mammalian evolution. By contrast, none of the eight 
X- homologous genes map to the Y-repeat regions; all eight 
map to regions previously identified as ^consisting largely 
of single-copy (or in some cases X-homologous) sequences. 
5 It is possible that, early in mammalian evolution, these 
regions of the NRY shared extensive sequence identity with 
the nascent X chromosome. The stage is now set for 
systematic evolutionary, biochemical and cell biological 
studies of the NRY, an idiosyncratic segment of the human 
10 genome. 

The present invention relates to isolated DNA and 
genes, present on (which occur on) the Y chromosome, whose 
sequences are provided herein, as well as characteristic 
portions of the DNA. It relates to additional nucleic 
15 acid/nucleotide sequences which are not identical to the 
sequences presented herein but include substitutions or 
differences; DNA which includes substitutions or 
differences and encodes the same amino acid sequence as a 
DNA whose sequence is provided herein or includes 
20 substitutions which do not alter the ability of a DNA probe 
or primer which hybridizes to DNA whose sequence is 
presented herein to hybridize to the DNA containing the 
substitutions or differences. It further relates to DNA 
which encodes a protein or peptide whose sequence is 
2 5 presented herein. The present invention also includes the 
complements of the DNA sequences presented herein, DNA 
which hybridizes under stringent (high stringency) 
conditions to the DNA whose sequences are presented and to 
RNA transcripts. The invention further relates to encoded 
30 proteins, peptides and other products (e.g., glycoproteins) 
and antibodies which are raised against or bind to proteins 
or peptides whose amino acid sequences are presented herein 
or are encoded by DNA whose sequences are provided. As 
used herein, the term isolated DNA which occurs on the non- 
35 recombining region of the human Y chromosome refers to DNA 



BNSDOCID: <WO 9846747A2J_> 




WO 98/46747 



PCT7US98/07115 



-22- 



10 



15 



20 



25 



30 



which has been obtained or removed from the human Y 
chromosome or DNA, produced by any means (e.g., recombinant 
techniques, synthetic methods) , which has the sequence of 
such Y chromosome DNA. For example, isolated testis- 
specific DNA or isolated testis-specif ic DNA which occurs 
on the non-recombining region of the human Y chromosome is 
DNA which has been obtained or removed from the non- 
recombining region of the human Y chromosome or which has 
the sequence of such DNA and has been obtained or produced 
by any means . 

Thus, this invention has application to several areas. 
It may be used diagnostically to identify males with 
reduced sperm count in whom a gene has been deleted or 
altered. It may also be used therapeutically in gene 
therapy treatments to remedy fertility disorders associated 
with deletion or alteration of a gene described. In one 
embodiment of a gene therapy method, a gene described 
herein, or a gene portion which encodes a functional 
protein, is introduced into a man whose sperm count is 
reduced and in whom the gene is expressed and the encoded 
protein replaces the protein normally produced or enhances 
the quantity produced. The present invention may also be 
useful in designing or identifying agents which function as 
a male contraceptive by inducing reduced sperm count. This 
invention also has application as a research tool, as the 
nucleotide sequences described herein have been localized 
to regions of the Y chromosome . 

The present invention includes nucleotide sequences 
described herein, and their complements, which are useful 
as hybridization probes or primers for an amplification 
method, such as polymerase chain reaction (PCR) , to show 
the presence, absence or disruption of the gene of the 
present invention. Probes and primers can have all or a 
portion of the nucleotide sequence (nucleic acid sequence) 
of a gene described herein or all or a portion of its 
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complement. For example, sequences shown in the Figures or 
Example 2 (SEQ ID NOS. : 1-84) , as well as the complements 
thereof, can be used. The probes and primers can be any 
length, provided that they are of sufficient length and 
5 appropriate composition (appropriate nucleotide sequence) 
to hybridize to all or an identifying or characteristic 
portion of the gene described or to a disrupted form of the 
gene, and remain hybridized under the conditions use. 
Useful probes include, but are not limited to, nucleotide 

10 sequences which distinguish between a gene described herein 
and an altered form of that gene shown to be associated 
with reduced sperm count (azoospermia, oligospermia) . 
Generally, the probe will be at least 7 nucleotides, while 
the upper limit is the length of the gene itself, e.g., up 

15 to about 40,000 nucleotides in length. Probes can be, for 
example, 10 to 14 nucleotides or longer (e.g., 20, 30, 50, 
100, 250 nucleotides or any other useful length) ; the 
length of a specific probe will be determined by the assay 
in which it is used. 

20 In one embodiment, the present invention is a method 

of diagnosing or aiding in the diagnosis of reduced sperm 
count associated with deletion or alteration of a gene 
described herein. Any man may be assessed with this method 
of diagnosis. In general, the man will have been at least 

25 preliminarily assessed, by another method, as having a 
reduced sperm count. By combining nucleic acid probes 
derived either from the isolated native sequence or cDNA 
sequence of the gene, or from appropriate primers, with the 
DNA from a sample to be assessed, under conditions suitable 

3 0 for hybridization of the probes with unaltered 

complementary nucleotide sequences in the sample but not 
with altered complementary nucleotide sequences, it can be 
determined whether the man possesses the intact gene. If 
the gene is unaltered, it may be concluded that the 

35 alteration of the gene is not responsible for the reduced 
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sperm count . This invention may also be used in a similar 
method wherein the hybridization conditions are such that 
the probes will hybridize only with altered DNA and not 
with unaltered sequences. The hybridized DNA can also be 
5 isolated and sequenced to determine the precise nature of 
the alteration associated with the reduced sperm count. 
DNA assessed by the present method can be obtained from a 
variety of tissues and body fluids, such as blood or semen. 
In one embodiment, the above methods are carried out on DNA 

10 obtained from a blood sample. 

The invention also provides expression vectors 
containing a nucleotide (nucleic acid) sequence described 
herein, which is operably linked to at least one regulatory 
sequence . "Operably linked" is intended to mean that the 

15 nucleotide sequence is linked to a regulatory sequence in a 
manner which allows expression of the nucleotide sequence. 
The term "regulatory sequence" included promoters, 
enhancers, and other expression control elements (see, 
e.g., Goeddel , Gene Exp ression Technology: Methods in 

20 Enzymolooy 185 , Academic Press, San Diego, CA (1990)) . It 
should be understood that the design of the expression 
vector may depend on such factors as the choice of the host 
cell to be transformed and/or the protein or peptide 
desired to be expressed. For instance, the peptides of the 

2 5 present invention can be produced by ligating the cloned 

gene, or a portion thereof, into a vector suitable for 
expression in either prokaryotic cells, eukaryotic cells or 
both (see, for example, Broach, et al . , Experimental 
Manipulation of Gene Expression, ed. M, Inouye (Academic 

30 Press, 1983) p. 83; Molecula r Cloning: A Laboratory 

Manual . 2nd Ed., ed. Sambrook et al . (Cold Spring Harbor 
Laboratory Press, 1989) Chapters 16 and 17) . 

Prokaryotic and eukaryotic host cells transfected by 
the described vectors are also provided by this invention. 

3 5 For instance, cells which can be transfected with the 
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vectors of the present invention include, but are not 
limited to, bacterial cells such as E. coli, insect cells 
(baculovirus) , yeast and mammalian cells., such as Chinese 
hamster ovary cells (CHO) . 
5 Thus, a nucleotide sequence described herein can be 

used to produce a recombinant form of the protein via 
microbial or eukaryotic cellular processes. Production of 
a recombinant form of the protein can be carried out using 
known techniques, such as by ligating the oligonucleotide 

10 sequence into a DNA or RNA construct, such as an expression 
vector, and transforming or transfecting the construct into 
host cells, either eukaryotic (yeast, avian, insect or 
mammalian) or prokaryotic (bacterial cells) . Similar 
procedures, or modifications thereof , can be employed to 

15 prepare recombinant proteins according to the present 

invention by microbial means or tissue-culture technology. 

The present invention also pertains to pharmaceutical 
compositions comprising the proteins and peptides described 
herein. For instance, the peptides or proteins of the 

20 present invention can be formulated with a physiologically 
acceptable medium to prepare a pharmaceutical composition. 
The particular physiological medium may include, but is not 
limited to, water, buffered saline, polyols (e.g., 
glycerol, propylene glycol, liquid polethylene glycol) and 

25 dextrose solutions. The optimum concentration of the 

active ingredient (s) in the chosen medium can be determined 
empirically, according to procedures well known to 
medicinal chemists, and will depend on the ultimate 
pharmaceutical formulation desired. Methods of 

30 introduction of exogenous polypeptides at the site of 
treatment include, but are not limited to, intradermal, 
intramuscular , intraperitoneal , intravenous , subcutaneous , 
oral and intranasal . Other suitable methods of 
introduction can also include rechargeable or biodegradable 

3 5 devices and slow release polymeric devices. The 
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pharmaceutical compositions of this invention can also be 
administered as part of a combinatorial therapy with other 
agents . 

This invention also has utility in methods of treating 
disorders of reduced sperm count associated with deletion 
or alteration of a gene described herein. These genes may 
be used in a method of gene therapy, whereby the gene or a 
gene portion encoding a functional protein is inserted into 
cells in which the functional protein is expressed and from 
which it is generally secreted to remedy the deficiency 
caused by the defect in the native gene. 

The present invention is also related to antibodies 
which bind a protein or peptide encoded by all or a portion 
of a gene of the present invention, as well as antibodies 
which bind the protein or peptide encoded by all or a 
portion of a disrupted form of the gene. For instance, 
polyclonal and monoclonal antibodies which bind to the 
described polypeptide or protein are within the scope of 
the invention. A mammal, such as a mouse, hamster or 
rabbit, can be immunized with an immunogenic form of the 
protein or peptide (an antigenic fragment of the protein or 
peptide which is capable of eliciting an antibody 
response) . Techniques for conferring immunogenicity on a 
protein or peptide include conjugation to carriers or other 
techniques are well known in the art. The protein or 
peptide can be administered in the presence of an adjuvant. 
The progress of immunization can be monitored by detection 
of antibody titers in plasma or serum. Standard ELISA or 
other immunoassays can be used with the immunogen as 
antigen to assess the levels of antibody. 

Following immunization, anti-peptide antisera can be 
obtained, and if desired, polyclonal antibodies can be 
isolated from the serum. Monoclonal antibodies can be 
isolated from the serum. Monoclonal antibodies can also be 
produced by standard techniques which are well known in the 
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art (Koehler and Milstein, Nature 256 : 495-497 (19775) ; 
Kozbar et al, , Immunology Today £: 72 (1983); and Cole et 
al., Monoclo nal Antibodies and Canc er Therapy, Alan R. 
Liss, Inc., pp. 77-96 (1985)). Such antibodies are useful 
5 as diagnostics for the intact or disrupted gene and also as 
research tools for identifying either the intact or 
disrupted gene. 

The present invention is illustrated by the following 
examples, which are not intended to be limiting in any way. 

10 EXAMPLE 1 ISOLATION OF CDNA CLONES FROM HUMAN TESTIS 
LIBRARY 

"cDNA selection" (M* Lovett et al . , Proc. Natl. Acad. 
Sex. USA 88:9628 (1991)) was carried out using bulk cDNA 
prepared from human adult testes (Clontech, Palo Alto, CA) 

15 and, as selector, a cosmid library prepared from 

flow- sorted Y chromosomes (Lawrence Livermore National 
Laboratory: LL0YNC03) . A total of 3 600 random cosmids, 
providing nearly five -fold coverage of the 3 0 -Mb 
euchromatic region, were used to generate 150 pools of 

20 selector DNA. Using each of the 150 selector pools, we 
carried out four successive rounds of cDNA selection, 
followed by two rounds of subtraction with human COT-1 DNA 
(Gibco BRL, Gaithersburg, MD) to remove highly repetitive 
sequences. A plasmid library was prepared from each of the 

25 150 resulting pools of selected cDNA fragments, and 24 

clones from each library were sequenced from one end. Of 
the 3 600 sequences generated, about 600 were of poor 
technical quality and about 500 were found to derive from 
cloning vector or E. coll host, leaving 253 9 sequences for 

3 0 further analysis* Of the 2539 sequence fragments, 536 

corresponded to previously reported NRY genes (487 to TSPY, 
15 to YRRM, 14 to RPS4Y, 9 to SMCY, 5 to DAZ, 3 to SRY, 3 
to ZFY) and 41 corresponded to previously reported 
pseudoautosomal genes (15 to XE7 f 11 to CSF2RA, 4 to IL3RA, 
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3 to ASMT, 3 to IL9R, 2 to ANT3 , 2 to MIC2 , 1 to SYBL1) . 
Electronic analysis of the roughly 2000 remaining sequences 
revealed that about 2 00 contained known repetitive 
elements, and these were not pursued. By electronically 
identifying redundancies and sequence overlaps, the 
remaining sequences were reduced to 1093 sequence contigs. 
Sequences representing these 1093 contigs were individually 
hybridized to dot -blotted yeast genomic DNAs of 60 YACs 
comprising most of the Y's euchromatic region (S. Foote et 
al., Science 258:60 (1992)). 181 sequences that hybridized 
to the great majority of the YACs were judged likely to 
contain highly repeated elements and were not pursued, 
leaving 912 sequences for further analysis. The 912 
sequences were individually hybridized to Southern blots of 
Rl- digested human 4 6, XX female and 4 9,XYYYY male (L. Sirota 
et al., Clin. Genet. 19:87 (1981)) genomic DNAs. Blots 
were hybridized at 65°C in Church 1 s buffer (0.5 M Na^O, at 
pH7.5, with 7% SDS) , and washed at 65°C in IX SSC and 0.1% 
SDS, with 832 hybridizations yielding interpretable 
results . Many sequences appeared to contain highly 
repeated elements common to males and females, or failed to 
detect an unambiguously Y-specific restriction fragment, 
and these were not pursued. By contrast, 308 sequences 
hybridized to at least one prominent fragment present in 

4 9,XYYYY but absent in 4 6, XX, suggesting that these 
sequences derived from the NRY. Each of these 3 08 
sequences was individually used to screen, by 
hybridization, about 2 million plaques from a 1 phage 
library of human adult testis cDNA (Clontech, Palo Alto, 



EXAMPLE 2 LOCALIZATION OF 12 NOVEL GENES ON THE Y 
CHROMOSOME 

Genes were localized on a previously reported NRY 
deletion map by testing with PCR for their presence or 



CA) . 
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absence in individuals carrying partial Y chromosomes (D. 
Vollrath et al . , Science 258:52 (1992)). Most genes were 
localized to a single deletion intervals Some genes could 
not be unambiguously placed because copies exist in 
multiple locations in the NRY. In such cases, genes were 
localized by PCR testing of YACs encompassing the NRY's 
euchromatic region (S. Foote et al . , Science 258:60 
(1992) ) • X homologs of Y genes were mapped onto the X by 
PCR testing a panel of human/rodent somatic hybrid cell 
lines (Research Genetics, Huntsville, AL) . All PCR assays 
consists of 3 0 cycles of the following conditions: 1 min 
denaturing at 94°C, 45 sec annealing at 60°C, and 45 sec 
extension at 72 °C. TB4X primers were designed from an 
unreported intron. TPRX primers were designed from 
unreported cDNA sequence. All other primers were designed 
from cDNA sequences as submitted to Genbank. PCR primers 
were as follows: 



GENE LEFT PRIMER 
DBY CATTCGGTTTTACCAGCCAG 
20 (SEQ ID NO, : 51) 

TPRY GCATCATAATATGGATCTAGTAGG 

(SEQ ID NO. : 53) 
TB4 Y CAAAGACCTGCTGACAATGG 

(SEQ ID NO. : 55) 
25 EIF1AY CTCTGTAGCCAGCCTCTTC 

(SEQ ID NO. : 570 
DFFRY GAGCCCATCTTTGTCAGTTTAC 

(SEQ ID NO. ; 59) 
CDY GGCTCAAAATCCACTGACG 
3 0 (SEQ ID NO. : 61) 

BPY1 CTCCCTGAGCAGCAACTAAG 

(SEQ ID NO. : 63) 
BPY2 CCAGGACCATGTGATATGG 

(SEQ ID NO. : 65) 



RIGHT PRIMER 
CAGTGACT CGAGGTTCAATG 

(SEQ ID NO. : 52) 
GGAGATACTGAATAGCATAGC 

(SEQ ID NO. : 54) 
CTCCGCTAAGTCTTTCACC 

(SEQ ID NO. : 56) 
GACTCCTTTCTGGCGGTTAC 

(SEQ ID NO. : 58) 
CTGCCAATTTTCCACATCAACC 

(SEQ ID NO. : 60) 
CAAGCGATATCTCACCACC 

(SEQ ID NO. : 62) 
GTCATCAACATGGGAAGCAC 

(SEQ ID NO. : 64) 
CTAATTCCCTCTTTACGCATGACC 

(SEQ ID NO. : 66) 



BNSDOCID: <WO 9846747A2_I_> . 



WO 98/46747 



PCT/US98/07115 



-30- 



XKR Y CACTCATGGAGAAGGGTAGG 

(SEQ ID NO. : 67) 
PTPR Y GAGCACACCACACCAGAAAC 
(SEQ ID NO. : 69) 
5 TTY1 CTCTGGGAATCAAATTCGAGG 
(SEQ ID NO. : 71) 
TTY2 GACAACTCTGACAGCCAGG 

(SEQ ID NO. : 73) 
DBX CTACATGCAGATGACATGGTG 
10 (SEQ ID NO. : 75) 

TPRX CATGTTCCCTGTAGCACATC 

(SEQ ID NO. : 77) 
TB4X CCCGCCCTTTCATCATCC 
(SEQ ID NO. : 79) 
15 EIF1AX CACGAGGCGCCATTTGCTG 
(SEQ ID NO. : 81) 
DFFRX CCTCCACCTGAAGATGCC 
(SEQ ID NO. : 83) 



GTCACACTCAGCCTCTTTAC 

(SEQ ID NO. : 68} 
CTCAGACTGACCTCGGACTG 

(SEQ ID NO. : 70) 
GTCTTTCAGCCAATCCAAGG 

(SEQ ID NO. : 72) 
GTCAGAACTCCCAAACAGG 

(SEQ ID NO. : 74) 
GGCCAAGGTGCATAGGTG 

(SEQ ID NO. : 76) 
CGTTTCCATTACTTCCATTTCCTG 

(SEQ ID NO. : 78) 
GCTCCCCAAAGTAGCCTTC 

(SEQ ID NO. : 80) 
CTGGAGG C C AGG C AACGTG 

(SEQ ID NO. : 82) 
CTGAGAT C CAGGTGAATGG 

(SEQ ID NO. : 84) 



EQUIVALENTS 



20 Those skilled in the art will recognize, or be able 

to ascertain using no more than routine experimentation, 
many equivalents to the specific embodiments of the 
invention described herein. Such equivalents are intended 
to be encompassed by the following claims . 
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CLAIMS 

We claim: 

1. Isolated testis-specif ic DNA which occurs on the non- 
recombining region of the human Y chromosome or the 

5 complement thereof. 

2. The isolated testis-specif ic DNA of Claim 1 which 
occurs in multiple copies on the non-recombining region 
of the human Y chromosome or the complement thereof . 

3. The isolated testis-specif ic DNA of Claim 2 selected 
10 from the group consisting of: 

(a) a CDY gene or a characteristic portion thereof; 

(b) a BPY 1 gene or a characteristic portion thereof; 

(c) a BPY 2 gene or a characteristic portion thereof ; 

(d) an XKRY gene or a characteristic portion thereof; 
X5 ( e ) a PTPRY gene or a characteristic portion thereof; 

(f) TTY1 DNA; or a characteristic portion thereof; 

(g) TTY 2 DNA; or a characteristic portion thereof; 

(h) a complement of (a) ; 

(i) a complement of (b) ; 
20 (j) a complement of (c) ; 

(k) a complement of (d) ; 
(1) a complement of (e) ; 
(m) a complement of (f ) ; 
(n) a complement of (g) ; 
25 (o) DNA encoding the amino acid sequence of SEQ ID 

No . : 3 9;. 

(p) DNA encoding the amino acid sequence of SEQ ID 
No . : 4 0; 

(q) DNA encoding the amino acid sequence of SEQ ID 
30 No. : 42; 

(r) DNA encoding the amino acid sequence of SEQ ID 

No . : 44 ; 
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(s) DNA encoding the amino acid sequence of SEQ ID 
No. : 4 6; 

(t) DNA encoding the amino acid sequence of SEQ ID 
No . : 4 8; and 

5 (u) DNA which hybridizes to a DNA of any one of (a) 

through (t) under stringent conditions. 

4. Isolated testis specific DNA selected from the group 
consisting of: 

(a) DNA of SEQ ID No. : 37 ; 

10 (b) DNA of SEQ ID No.: 38, 

(c) DNA of SEQ ID No.: 41; 

(d) DNA of SEQ ID No . : 43 

(e) DNA of SEQ ID No.: 45, 

(f ) DNA Of SEQ ID No. : 47 , 
15 (g) DNA of SEQ ID No.: 49; 

(h) DNA of SEQ ID No.: 50 

(i) DNA encoding the amino acid sequence of SEQ ID 
No . 3 9 ; 

(j) DNA encoding the amino acid sequence of SEQ ID 
20 No. 40; 

(k) DNA encoding the amino acid sequence of SEQ ID 
No. 42; 

(1) DNA encoding the amino acid sequence of SEQ ID 
No. 44; 

25 (m) DNA encoding the amino acid sequence of SEQ ID 

No. 46; 

(n) DNA encoding the amino acid sequence of SEQ ID 
No. 48; 

(o) a complement of a DNA of any one of (a) through 
30 (n) ; and 

(p) DNA which hybridizes to a DNA of any one of (a) 
through (o) under stringent conditions. 
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5. Isolated X- homologous DNA which occurs on the non- 
recombining region of the human Y chromosome, is not 
testis -specific and has a homolog on the human X 
chromosome . 

5 

6 . The isolated DNA of Claim 5 selected from the group 
consisting of: 

(a a DBY gene or a characteristic portion thereof; 
(b) a TPRY gene or a characteristic portion thereof; 
10 (c) a TB4Y gene or a characteristic portion thereof; 

(d) an EIF1AY gene or a characteristic portion 
thereof ; 

(e) a DFFRY gene or a characteristic portion 
thereof; 

15 (f ) a complement of (a) ; 

(g) a complement of (b) ; 

(h) a complement of (c) ; 

(i) a complement of (d) ; 
(j) a complement of (e) ; 

20 (k) a complement of (f ) ; 

(1) DNA encoding the amino acid sequence of SEQ ID 
No. : 18; 

(m) DNA encoding the- amino acid sequence of SEQ ID 
No. : 22; 

25 (n) DNA encoding the amino acid sequence of SEQ ID 

No.: 23 

(o) DNA encoding the amino acid sequence of SEQ ID 
No.: 24; 

(p) DNA encoding the amino acid sequence of SEQ ID 
3 0 No. : 28; 

(q) DNA encoding the amino acid sequence of SEQ ID 
No.: 32; 

(r) DNA encoding the amino acid sequence of SEQ ID 
No. : 36; and; 
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(s) DNA which hybridizes to a DNA of any one of (a) 
through (r) under stringent conditions. 
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Isolated X-homologous human DNA selected from the group 
consisting of: 

(a) DNA of SEQ ID No. : 17 or a characteristic portion 
thereof ; 

(b) DNA of SEQ ID No . : 19 or a characteristic 
portion thereof; 

(c) DNA of SEQ ID No . : 20 or a characteristic 
portion thereof; 

(d) DNA of SEQ ID No . : 21 or a characteristic 
portion thereof; 

(e) DNA of SEQ ID No. : 26 or a characteristic 
portion thereof; 

(f) DNA of SEQ ID No. : 30 or a characteristic 
portion thereof; 

(g) DNA of SEQ ID No. : 34 or a characteristic 
port ion thereof ; 

(h) DNA encoding the amino acid sequence of SEQ ID 
No. : 18; 

(i) DNA encoding the amino acid sequence of SEQ ID 
No. : 22; 

(j) DNA encoding the amino acid sequence of SEQ ID 
No. : 23; 

(k) DNA encoding the amino acid sequence of SEQ ID 
No. : 24; 

(1) DNA encoding the amino acid sequence of SEQ ID 
No. : 28; 

(m) DNA encoding the amino acid sequence of SEQ ID 
No. : 32; 

(n) DNA encoding the amino acid sequence of SEQ ID 
No . : 3 6; 

(o) a complement of a DNA of any one of (a) through 
(n) ; and 
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(p) DNA which hybridizes to a DNA any one of (a) 
through (o) under stringent conditions. 

8. A DNA probe comprising all or a characteristic portion 
of DNA of Claim 4. 

5 9. A DNA probe comprising all or a characteristic portion 



of DNA of Claim 7. 
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DBX & 'DBY 
long and short transcripts 



DBX -B56 ctttccccttactccgctcccctcttttccctccctctcctcccct 

DBX -810 tccctctgtcctctcctcctcttcccctccccccccccgtccggggcaccctatattcaagccaccgtttcctgcttcacaaaacggcca 

DBX -720 ccgcacgcgacacctacggtcacgtggcctgccgccctctcagtttcgggaetcCffcctagctcccactaagoggaggctacccgcggaa 

DBX -630 gagcgagggeagattagaccggagaaatcceaccaeatctccaagcccgggaactgagagaggaagaagagtgaaggccagtgttaggaa 

DBX -540 aaaaaaaaacaaaaacaaaaaaaacgaaaaacgaaagccgagtgcatagagtcggaaaggggagcgaatgcgcaaggctggaaagggggg 

DBX -450 cgaagaggcctaggttaacattttpaggcgtcttagccggtggaaagcgggagacgcaagttctcgcgagatctcgagaactccgaggct 

DBX -360 gagactagggttttagcggagagcacgggaagtgtagctcgagagaactgggacagcatttcgcaccctaagctccaaggaaggactgce 

DBX -270 aggggcgacaggaccaagtaggaaatcccttgagcttagacctgagggagcgcgcagtagccgggcagaagtcgccgcgacagggaattg 

DBX -180 cggtgtgagagggagggcacacgccgtacgcgctgacgtagccggccttccagcgggtatattagaticcg&ggccgcgcggtgcgctcca 

DBX -90 gagccgcagttcceccgcgag. .g.ccttc.cggcga.eaaaca. . .g.ttagcagcg.a.gact .g..c.,.g.a 

DBY -71, ccagcgtaagagctccgctaetcggteteacacctacagtggactacccgatttttcgcttctcttcaggg 

1. . . A. E . A I* G , . . F . G SDN. . . 

DBX 1 CA...G CG.TC.GG T GGC. .A TCA. .T. .T 

DBY 1 ATGAGTCATGTGGTGGTGAAAAATGACCCTGAACTGGACCAG<1A.G^ GAAAAACAGAGTGGAGGA 

1MSHVVVKKDPELDQQLANLDLNS -EKQS G^G 

30 - - R . . T R . ' . Y 

DBX 88 — — — C * • T T CG TA G T T C G 

DBY 88 GCiAAGiACAGCGAGCAAAGGG 

31 AS TAS KG RYX P P H 1» R N K £ A S K G.F.H D K D S S G 

60 . . S ' . S ; . . . S S F - . ~*.D™f . 

DBX 178 CT G A TAGT A G T A C TC - ■•-•^ 

i>By 178 TGGAGirh3CAGCAAAGATAAGGATGC^^ GATTCTAGAGGAAAGCCTGGTiATTTCAG 

60WSCSKDKDAY S S F G S R, -D S RGKPGYFS ERG 

89 „ " S . G D . S . . K 

Dfl* 268 G; T..C C GC . . .GGT. . C. . . AG A 

DBY 265 AGTGGATCAAGGGGAAGArTTGATGATCGTGGACGGAGTGACTATGATGGTATTGGCAATCGT GAAAGACCTGGCTTTGGCAGATTT 

. 90 S G S R G RFDDRG RSDYDG IGN R-ERPCFGRF 

118 . . G . K D 

DBX 35B TG A.C C A A C A..G... 

DBY 352 GAACGGAGTGGACATAGTCGTTGGTGTGACAAGTCAGITGAAGATGATTGG 

120 ERSGHSRWCDKSVEDDW5KP&PP5ERLEQE 

148 . N 

DBX 448 . -C.^^. C T. T^. C C..-.T A AC 

DBY 442 CTGTTTTCTGGAGGAAACACGGGGATTAACTTTGAGAAATATGATC 

150 LFSGGNTGIN FEKYDDI .\P. VE»ATGSNCP PHI 
178 . S . . . V E \V\ . 

DBX 538 . . A.G . . .C . . T. . .G G A G ...... T C. .A G. - . 

DBY 532 gagaattttag<:gatattgacatgggagaaattatcatggggaacatigaacotactcgct 

180ENFSDIDMGEI IMGNIELTRYTRPTPVGKH 

208 E . . . . M 

DBX 628 ..T C. -A. AG GA G A G. .G CT. . 

DBY 622 GCCATTCCTATTATTAAGGGAAAAAGAGACTTAGTCGCra 

210 AIPI IKGKRDLVACAQTGSGKTAAFLL P XL 

238 ...-S R.K 

DBX 718 T...T C. . G G. . .CA C C... 

DBY 712 AGTCAGATATATACAGATGGTCCAGGAGAAGCTTTGAAGGCTGTGAAGGAA 

240 SQ I Y TDGPGEALK.AVKENGRYGRRKQY P IS 

268 rt 

DBX 808 -A. . - A G G A C. .A A C..G 

DBY 802 TTGGTTTTAGCCCCAACAAGAGAATTGGCTGTACAGATCTATC 

270 I»VI>APTRELAVQIYEEARKFSYRSRVRPCV 

298 • 

DBX 898 C AG T 

DBY 892 GTTTATCGTGCTGCTCATAT^ 

300 VYGGADIGQQIRDLERG CHLL VATPGRL V I) 
328 . * 

^DBY 982 ATCATGGAAAGAGGAAAGA'rrGGATT 

330 MMER„GKIGLDFCKYLVLDEADRMIiDMG F E P 



DBX 1078 T A. A* C T. . ..G...T..C. .C T '. . 

DBY 1072 CAGATACGTCGTATAGTTGAACAAGATACTATGCCACCAAAGGGCGT^ 

360 Q X R R.I VEQ DTMP PKGVRK TMMFSATF P KEI 



388 

DBX 1168 G. T. .C. .A .■_.,€. .._ A T A 

DBY 1162 CAGATGCTTGCTCGTGACTTTTTGGATGAATAT^^ 

390 QMI/ARDFLDEYIFLAVGRVGSTSENITQKV 

418 . . . . E S L . N . . . K 

DBX 1258 A.C...C G. .T. . .C .CC. ,AA C.AG G. .C 

DBY 1252 GTTTGGGTGGAAGACTTAGATAAACGGTCATTTCTACTGG^ 

420 VWV EOLDKRSFLLOILGA Ti \G SDSLTLVFVE 

448 ?V 

DBX 1348 T T C. . A. . C. . C. . C T. .T. . . G . 

DBY 1342 ACCAAAAAGGGAGCAGATTCCCTCGAG 

478 ... A . 

DBX 1438 A A ■ .C. T -.A. .A A ...G. 

DBY 1432 CGAGAGGAGGCCCTTCACCAGTTTCGCTCAGGAA^ 

480 REEALHQFRSGKSPILVATA' VAARGLD S^N^ 

508 . K . . 

DBX 1528 A . . .C A T G * t' 

DBY 1522 GTGAGACATGTTATCAATTTTGATTTGCCAAGTGATATTGAAGAATATGTC 

510 VRH VINFDLPSDIEEYVHRIGRTGRVGNLG 
-538 R.I. 

DB %BY ^CCACCTCATTC^ 

568 S S 

DBX 1708 A C ... . - j. A T G. . . .GC. T G 

DBY 17C2 TGGTWAAAATATCGCrrTATG^ 

FIG. 3A 
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598 

dbx mr 

DBY 1 



S . S S . A 

i A i ^CA^CAGC..C... i C, 



. K 
. .CC. 



"1^1 AGAGACTATCCACAAAGTAGTGGTTC^ 
600 HDYRQSSGSSS SGFGASRGSS S&SGGGGYG 



28 S 



^DBY 1882 GACAGCAGAGGATTrTGGTGGAGG 

630 DSRGFGGGGYGC F V H S DGYGGNYNSQGV D W 



658 



660 



DB D£Y 1972 TCG^C^cfcSAatetgcttcgc^ 
660 W G N * 662 

DSX 2067 t c actg. .aCtttttt . . t g 

DBY 2060 gtagcttcaagaacttgcagtacattaccagctgcgattctcctgacaa ttcaagggagctcaaagtcacaagaagaaaaat 

DBX short 2186 rgatcatgctcatctgtggagcaagtgcccccatgaaatgccatattttgtgaagaaagt 

DBX long 2155 c.t g gg ct.c g ctc.cct g...a.cc — 

DBY long ■ 2142 gaaaggaaaaaacagcagccctacccagaaactggcttgaagatgtaatcgctccagtttggattaa^ctcttcccctcccgctttagtg 



DBX short 
DBX long 



2245 gcatgcaggaatattcagggagtccagcatgtagtcatggcagccttaggtatttgagaccgaccaaccctcctgatgaagacaaccata 
2240 .. .t a t . .c. . . a tt. .t. . 



ax xong ££«u...c a t c a tt..t 

DBY long 2232 ccaccccaaactgcatttataattttgtgactgaggatcgtttgtttgctaacgtactgtgactttaactttagacaacttaccacttcg 
DBY short 2239 t aaaaaaaaaa 2248 

DBX short 2335 actcatgcagaacttggagcgtgatgcccagaagtgtgtgaactggtctgtgaccacaaagatgagaaccgcatgctgagattggtggaa 

DBX long 2330 a'. ...t. .«....* g g c 

DBY long 2322 atg&cctgttggctcagtaatgctcacgataccaattgtcttgacaaaataaatttactaaac&tggcctaaaatcaaaccttggcacag 



2425 tggagatttcagtgagcctacatgcagatgac*tggtgacacccgtgcccagcctgagctgttttcttctggccctcttattacatgaga 

DBX long 2420 ....g - a g ; . .at. t. . . .a. . . c.t. t. .g. . . .c g..t 

DBY long 2412 aggtatgatacaactttaacaggagtcatcaattcateeataaatataaaaagggaaaaaaacttaaggcagtagtctgcattaggactg 



DBX short 



DBX 1°?Q 2683 . -c. gc. . .ca..ca. .gc.c.gt. . .caag.t.a.gcaag. 

DBY long 2675 tataccaattaatatttttgaaagagttcttttaggttaatt— -— rt 



2515 aa aataaac acctatgcaccttggcct caaaaaaaaa a 2552 

DBX long 2505 . . - . .cac. e.g. . . .aaaac.ffa. . £5. . , .g att.atggaat c.aa.g. . . .a. : . tg. . t. . .g.. . . 

DBY long 2502 tttgagttttgcagacttggggttgggag— aacaccttaaagcattaaagca tagttttttgtatggccaaccttactaaatcaa 

DBX long 2595 a t . 1. t. .a.a.c. .g. t.g. . . »;g.a 

DBY long 2586 -gttctgacttgctcactctatcctggataggcacttgggaacttacacectttaagccattccagtcatgatgaggtggaatgtatcag 

a ....**...« c ... t . tc • c c .. a tea • » a * 

taagtacagcaatttctcatgtaatgtttagggag — — ttta 

DBX long 2769 . .a.g.g. . . .c. . . . t. tgg. . . .a. . . .t. . .g.g.,a g.c ct. .aaaataggttttta a,-.c . .tac 

DBY long 2756 ttctaacctaggcaaacg gcatgctatcacaagaaaggtttaaagctttgataaaatggg-- — - ggagaettaatc 

2859 ttagac a.g. c tg. .a ,gg. .* . t. ... . .ca. . .a c. . . . . t. .gg,c~. . . .a. . 

2826 - — agtttttttaatgcctgctataaa— aatttgaaatattagaatggcegaccatggcagtgaccaggcctcactacaggcctggttg 

22?? at. .t aa. . .c ; gag — . . .a.g a. . .aaa t. . . 

2913 gattctggcctt-taatgcatgctagtgttgatgttttttggtcaagaacggtttaaacaggaaggattg — tgcagcaggctttaactt 

2227 • - a- - cc. .... .c.t. a c g. . . .g c. . . . . .at c.t. . .aatggctc.ag. .a.g 

3000 aa-tgtagattcatactgctctgttaaagctgcattgaaatgttaaaatggcttacacttgcagactttgcaaa tcttaagac 

3124 ctgaaa. .t — .g g. . .g. .-. . .t . at caa. . . .getctaca t 

3082 taacaaa tccttgaaatcacacagcctgcaaatacgtactaaactgcacaaggtgtgtgttctatat gtgcagttctagc 

3214 cag t. gt. .a c. aac ; . - 

3162 gtattttagttgcataggtttccatggeatttatagtct-cttgtgctaaatttggccaaagatg attgtccaccactaaaaatgcc 

3303 . . . g t g. .a a. .' .gg aaa.cgct\c. tct. ...a..g.......g ae.g 

3248 tctcccacttggaattctgtactgattttgcggccaga-tgcaatgatctttaaaaacaaatctttt-caatggcataagaagttgacaa 

3393 c g \c...$4c\.a.taggt 

3336 aaatttcttaaagtgcaatagattttcaagt-tattgtgccttgttctaaa^t'tttaagtagg gcaettgacagtattgaggtca 

3ffi3 .... ....t c g.c tg. : cttc a 

3420 tttgttaaggtgctattteaattagtgtaggtttagactcttgtacatttctcccataactttttacaaagta ttttgttgcacat 

3573 — c ta.a c.t.. c tt.. 

3506 tcagagaattttatatatatatgtcttgtgtgggtgtcctcgaccttccaatcttatttcgtctcttggagattgttgaatgcagccagt 

3 661 ct.g. .gg. . .gggactaga.t a t g....a g g... c 

3596 g-aagaagtagat — tcctaaattttattggggaccatgg-aatggtagttgagaagaaaactatttgcacacaacagattt- 

3751 ct..a g 

3 674 tagatactttttgetgetag — ttgtgtaatatttattgaacattttgacaaatatttatttttgtaagcctaaaaatgattctttgaaa 

3841 - gc t .c . . 

3762 gtttaaagaaacttgaccaaaagacagtacaaaaaacactggcacttgaatgttgaatgtcaccgtat— gtgaaataatatattttggg 

3930 t a. a ct.a g g ca...c 

3850 gtagtgtgagcttttaatgtt-aagtc-tgttaaact — tgagtcaaattaagcagacccggcattggcaatgtagctgtaattttctga 
4020 tgtt.g.a..a g.g t tt c ; c. .caa. . 

3 936 caaaatttaagacaaaattgtcaacttgaaactaaaacatgccaaggttttgatatacttgtcttaagatattaatgaaacaattttgaa 

4 ii9 - - *-fft.. ..t g..t.t..g.tg..tgt.gtg..ga ta. . tg.g. . t . 1. 1. . . t-. a c 

4 02 6 cactgataggaag-gtccacatccacaaagtt t etc t tgagttt tgt t atgtgttttgctgtgt ttgattt tcagtgattgtc tggt 

• '~ g ' * a tff c * c — tgtattggcataat -gt a.caatagcatttgagcaag. . . . 

4112 atatttacagtcctcaaacatg — gttatttctgt cagtgactta-acattcgg tttt 

4 ?52 mtm • ** a - " • " a * * * fc g.tt.ca. .aatcatgtaaggatttaaac ag a 

4167 accagccagcagtattcttcagtaaataaagaatggaa ttgctgaatgtaatcattgaacctcgagtcac 

4377 . . , .gc — . .t.. . a. t g. .ca. . attaaa . . aaaaaaa 

4237 tgtaaaagttcagtaattgcttattgtattagttttagatoctgocaccgcatgtgctctgtttattctgattttactaa aataaaa aafc 
4464 aaa 4466*' 
4327 T^caaaagtc aaaaaaaaaa 4345 
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TPRY 

short, medium and long transcripts 



Medium 
Short 



Medium 
Short 



_ 10 05 gctcatcgtttgttg 

-990 tttagataatatcatgaactgataaatgcagttgccacgttgattccctagggcctggcttaccgactgaggtc.ataagatattatgcct 

-900 tctctttagacttggteagtggagaggaaatgggcaaagaaccagcctatggaggtgacaaggccttagggccaaaagtcttgagggt£a 

-810 aggcccagggcctgcgcagcttccctgccatgccccgcaaggtctcgcattcgcaaggcttgtgacagtaggagccecaccacggactct 

-720 cctaaagtccatggtgtcctcttttcgcatttgcgccccgtgggtgatgcccgatgccgcccttcccatcgctctcttccccttcaagcg 

-630 tatcgcaactgcaaaaacacccagcacagacaccccattttctatcttaatgcatttaactagcacaacctacaggttgttccatcccag 

-540 agactacccttttctccatagacgtgaccatcaaccaaccagcggtcagaatcagtcagcctctgtcacgctcctaggtccttggcgaac 

-450 tggctgggcggggtcccagcagcctaggagtacagtggagcaatgcctgacgtaagtc&acaaagatcacgtgagacgaatcagccgcct 

-360 agattggctacaactaagtggttgggagcggggaggtcgcggcggctgcgtggggttcgcccgtgacacaattacaacettgtgctggtg 

-270 ctggcaaagtttgtffattttaagaoa&tctgctgtgctctccagcactgcgagcttctgccttccctgtagtttcccagacgtgatccag 

-180 gtagccgagttccgctgcccgtgcttcggtagcttaagtctttgcctcagctttCCtccttgcagccgctgaggaggcgataaaactggc 

-90 gtcacagtctcaagcagcgattgaaggcgtcttttcaactacccgattaaggttgggtatcgtcgtgggacttggaaatt&gttgtttcc 



AGTGAAGA<^TCT^A«««ACA^AGGA^ 

' GAAGATXJGCGCCAGAACGAACAC^CTACTAGG^AAGGCTO 
ED G A RTKTLLGKAVRCYES LIfcKAEGKVES 

GACl'ltJ-l'l'X-l^CCAATTACKnXZACTTCA^ 

DFF'CQliGHFNI*.IiI*EDYSKAI#SA.YQRYYSLQ 
AAGTCrAGTTTAAACCATTrTCAGTTAGCCTTGAlTGACTGTAATC^ 

KSSLKKFQIjAIilDCNPCT. . L..SNAEIQFHIAH 

TOTATCAAACCCAGAGGAAGTAT^l^ 

ACTGTATTGCAACAGTTAGGTTG 
CAAAAGTCTTTGGAGGCAGATCCTAATTC^ 
TTTATATCTTACAGGCAATCTATTOATAAATCAGA 

C CT A TGG ATG ACAGGCAT AT ATTTGTC CTG TAG AATTGG AC CATG GG CATG C CGCAG^CTGG ATGG AC£TAGG TACTCTCT ATG AA 
TCC^CA^TCAACC^ 
ATTAAApTpACAGAATG^ 

ACACCACAGAAATTACAGCACTTGGAACAACTCCGAGCAAATAG 

TPQ KLQHLEQLRANRDNLNPAQKHQLEQLE 

agtcagtttgtcttaatg^gcaaatgagac^ 
^a|tgcctacaaactctg^^ 
tgtgttgaaaaactttrgtccag^ggagctt^ 
rrtgctatccagtaattgtatagcaggaa 
acagacctgaacagcagcacagaagagccatggaga^ 

T d L NSSTEEPWRKQLSNSAQGLHKSQSSCL 
^CAGGACCTAATGAAGAACAACCTCTC 



1 
1 

SI 
HI 

271 
91 

361 
121 

451 
151 

541 
181 

631 
211 

721 
241 

811 
271 

901 
301 

991 
331 

1081 
361 

1171 
391 

12 61 
421 

1351 
451 

1441 
461 

1531 
511 

1621 
541 

1711 
571 

1801 
601 

"ell 

2071 
691 

2161 
721 

2251 
751 

2341 
781 

2431 
811 

2521 
841 

2611 
871 

2701 
901 

2791 
931 

2881 
961 

996 EENEKRTQHKDHSDNEST S S E N S G R 

29 86 rGAAGAAAATGAGAAAAGAACACAACACAAAGATCATTCAGATAACGAATCCACATC 

2971 CAGGAATCATTGAGAGCTGGAATGCAATGGTGTGATCTCAGCTCACTGCAGCCTCCGCC 

991 QES LRAGMQWCDLSSLQPPP PGFKRFSHLS 

R R K G P FKT I K FG TN IDLSDN K K W K L Q* LHE L 
3061 AGAACCAAAGGACCTTTTAAAACCATAAAATrrGGGACCAACATTG 
3061 CTCCCGAATAGCTCGAATTACAGG 

1021 L.PNSWNYRHLPSCPTNFCIFVETGFHHVQQ 



CATCTCACTCTGCCTAGTAATTCAGTACCAC^^ 

KLT L P SNSV PQGDADSHLS C 



_ ^TACTGCTACCTCAGGTGGACAACAAGGC 
HTATSGGQQG 



ATTATGTTTAu ^jj^g^^^"^ ^TT ^rT^^'Tl tT R™H "t"'G D ' f N **G~~C~ 

GCTGATGTCAJ^G^CTTT^AA 



ATTGCAGACAATCCTCAC _ 
I A D N P Q L 



GATTGGAAAAGCCAATGGCAATGTGGGTACTCGAACCTGCGAC^ 

SALLIGKANGNVGTGTCDKVNNI 



CACCCAGCTCTT^^^ 

GAGCAGAGAAGCATAAACAGTOTACCA 

AGCTCTACAAAAGTAGACCTGGCTTTAGCTAGCC^ 

S S T K V D LP L AS HRSTSQ I L jP SMS VS I C PS S 
ACAGAAGTTCTGAAAGCATCCAGGAATC 

CCAACTTCACCATACCCACCCTTGCCAAAGGACAAGTTGAATCCACCCACA 

PTS P Y PP LP KDKLNPPTPS\y. YLENKRDAFF 
CCTCCATTACATCAATTTTGTACAAATCCAAAAAACCCTGTTACAGTA^ 

P P LH QFCTN PKNPVTVIRGL.AGALKLDLGL 
'l"rCTCTACCAAAAC M rJr i \ W TAGAAGCrAACAATGAACATATGGTAGA^ 

F STX TLVEANNEHMVEVR, TQ LLQ PAD ENWD 
CCCACTGGAACAAAGAAAATCTGGCGTTGTGAAAGCAATAG^ 

P T GTKKIWRCESNRSHTTIAKYAQYQ ASSF 
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~ J?! 1 T K £ ? AFX RVVSAGNLLTHVGKT I L . G M M T V O 

Short 3151 CCTTCTCTTGWICTCCTGACCTC^ 

1051 ACLEfcLTSGCLLASASOSAGITOVSHHAR* 1079 



1081 L Y 



KVPGSRTP 



H Q E N N N 



Medium 3241 CTGTATATGAAAG-TTCCAGGGAGTCGGACACCAGGTCACCAAGAAAATAACAAl 
Snort 3241 actcttaaaaatgtaagcaaaattacagtatgtaaaacacacattgctaatggj 



VKIHIOPGD 
AACATAAATATTGGTCCAGGAGAT 



taaaaatgtaagcaaaattacagtatgtaaaacacacattgctaatggagaaataaagtccctacctttacatctaaaaaaaaaa 3330 



Hii * W PVVPEDYW'GVLNDFCEKNNLN FL H S S W 

Medium ■ 3 331 TGTGAATGCUTltmxrrACCTGAAGATTATTGCGGT^^ 

1141 W P N L E D L Y EANVPVYRFIORPGDLVWINAG 

Medium 3 421 TGG CC CAAC CT1G AAG ATCTTTATGAAG CAAATGTCCCTGTG TATAGATTTATTCAGCGACCTGG AGATTTG GTCTGG ATAAATG CAGGC 



Medium 



1171 TVHWVQTTVGWCNNIAWNVGPLTACQYKLAV- 
3511 ACTGTGCATTGGGTTCAAACTGTTGGCTGGTGCAATAACATTGC 



1201 E RYEW.NKLKSVKSPVPMVHLSWNMARNXKV 
Medium 3 601 GAACGGTATGAATGGAACAAATTGAAAAGTGTGAAGTCACCAGTACCCATGGT 



1231 S D- P K L F E M X K * 1240 

Medium 3691 TCAGATCC^UVAGLU'lVl'l'GAAATGATTAAGTAAgt gc c 1 1 c t gaaa c tgc tgc agt 1 1 c t c 1 1 1 gggggrt a t tgg t a geca 1 1 cag t a 1 1 
Long 3723 TTCTCTTTTGAAAATTCTGAAGCAATATCAGACATTC 

1241 Y CLLKILXQYQTLREAL V AA 



Medium 
Long 



3781 tttttcaaaagaafctctgeti 
3781 GG AAAAGAGG TTATATGGCA' 

1261 G KEVIWHGRTHDEPA 



3781 GG AAAAGAGG ITATATGGCAIGGGCGGACA^ 



H 



C.SI. CEVEVFNLLF 



Medium 3871 cacaagtgttataaaatctcataag attaaaa tattgccttccctt aaaaaaaaaa 3926 

Long 3871 GTCACTAATGAAAG<^TACTCAAAAAAL!L u lACAT 

1291 V T N E S KTQ'KTY IVH'C HDCARKTSKSL E.NFV 

Long 3961 GTG CTCGAACAGTAC^AAATGGAGGACCTAATC 

1321 VLEQYKMEDLIQVYDQFTLALSLS S S.S.^.-^ 1347 



Long 

Long 

Long 

Long 

Long 

Long 

Long 

Long 

Long 

Long 

Long 

Long 

Long 

Long 

Long 

Long 



4051 

4141 

4231 

4321 

4411 

4S01 

4S91 

4681 

4771 

4861 

4951 

5041 

5131 

5221 

5311 

5401 



tccatgaatattaaatgagattatttccgctcttcaggaaatttctgcaccactggttttgtagctgtttcataaaaccgttgactaaaa 

gctatgtctatgcaaccttccaagaatagtatgtcaagcaactggacacagtgctgcctctgcttcaggacttaacatgctgatccagct 

gtacttcagaaaaataatattoatcatatgttttgtgtacgtatgacaaacLgtcaaagtgacacagaatactgatttgaagatagcctt 

ttccatgtttctctatttctgggccgatgaattaatattcatttgtattttaaccctgcagaattttccttagttaaaaacactttccta 

gccggccatttcttcataagatagcaaatetaaatctctcctcgaecagcttttaaaaaatgtgtaetattatctgaggaagetttttac 

cgccttatgcctttgtgtgctttgaggccatgatgattacatttgtggttccaaaataatttttttaaatattaatagcccatatacaaa 

gataatggattgcacatagacaaagaaataaacttcagatttgtgatctttgtttctaaacttgatacagatttacaceatetataaata 

cgtatttattgcctgaaaatatttgtgaatggaatgttgtttttttccagacgtaactgccattaaatactaaggagtcctgtagtttta 

aacactactcctattacattttatatgtgtagataaaactgcttagtattatacagaaatttttattaaaattgttaaatgcttaaaggg 

tttcccaatgtttgagtttaaaaaagactttctgaaaaaatccaetttttgttcatt^^caaacctaacgattatatgtattttatatgt 

gtgtgtatgtgcacacacatgtataatatatacagaaacctcgatacataatitgtata'gattttaaaagttttattttttacatctatgg 

tagtctttgaggtgcctattataaagtattacggaagtttgctgtttttaaa^rtaaatgtctcttagcgtgatttattaagttgtagtca 

ccatagtgatagcccataaataattgctggaaaattgtattttataacagnagaaaacatatagtcagtgaagtaaatattctaaaggaa 

acattatatagatttgataaatgttgtttataattaagagtttcttatggaaaagagattcagaatgataacctcttfctagagaacaaat 

aagtgacttatttetttaaagctagatgactttgaaatgctataccgtcctgcttgtacaacatggtttggggtgaaggggaggaaagta 

ttaaaaaatctatatcgctagtaaattgtaataagttct attaaaa cttgtatttcatacg aaaaaaaaaa 5471 
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TB4X Sc TB4Y 

-, a . tgggaacagacagatcctttgttctgaggctcactcatctcccgagc 

TB4Y -720 cccgagccgtctcccagcctcagacggctctgcgggctgcatccgcgcagcctggcagcggcggcgctgcgccgcgacatcttcacagcc 
SJ? -630 ctt?t?gc2gaggcatgtgtgctagggatgccgaaatgccgagagcgc^ 
TBAY -540 cagtgcggcgcgggcgLgaactggtagg^ 
raJy -450 tllllwl^ 

SJjr -270 tcaaattcctttattccggaacattccactttgagagggatctgtcctct^ 

TB4Y -180 cctcttttcctgtggaaagaggaagctcatffagcgcgaaacagcaggggacggagggcgagaagggcttcctcaggttgcgggtcggagg 
"«<r 3i gcagaagcacagt'c"^^ 



TS4X 



P 

,c. .c. 



2^ * K * 44 c ~" 

ra TB4r 11 *£aAAGAAACTATO^ 

31SKETIEQERQAOES* 44 

ra Tfl4r 161 ctttCttitttcictttrtCt^ 

• ^ _ _ f totctat a.. .«••••••«••••••■•*•• S~ • a » ■•* ^ • 

7 *TB4Y 2?1 caaaga'ctSccgaaaatgg^ 

356 - - • .a- • t. .ggtggaagaagtggg b.a. . . . . .gtaaaaccaagccggcccaagcgccctgcaggctgta 

TB4r 355 tgaaagacctagcgga gtgggagggcagtgaaatctaga 334 

TB4X 44 6 atgcagtttaatcagagtgcca 467 



FIG. 5 

EIF1AX Sc EZF±A¥ 



elF-lAX 
elF-lAY 



elF-lAX 
elF-lAY 



eTF-lAX 
elF-lAY 



e IF- J AX 
aIF~lAY 



elF-lAX 3 
elF-lAY 3 



nrtn ggcacgaggcgccatttgctgccgccgagcg 

elF-lAX -207 

eXF-lAX Ay : 13J eov«wggcra»cctctflMmctmecocca^ 

IIS «Icftg^gc^«^?«gSgIc!^ 

1 • • • * * * * * ■ * * ' * * * • _ • * * * * * " " ,.".a*.c!..!.- 
t ATCrcC^G^TJ^GGT^ 

3i. . . „- • • • G c »• * .a:. ^..:.c:. t:.c 

9 1 GATGGAgAA^AGTAlCCTCAGGTAATCAAAA^ 

121. . . r I .t! .cl . . • - tc.aca. . 

3 li -HraTCCTCGAGATCA^^ 

121 FG PGDDDE IQFDDlGODDt.UA wl -' J - 

. _ _ _ . «». at- a - ao . . . . tt .a . . t agcat .aa. . c.a- 

eI &F%AY ill kilmckigUkm^ 

m2 Sl&Lr Ml tcltSaaagcagActga^ 

Gl ZlF^XAY ill ttaa^actttfttg'ct^^ 

eI eiF^lAY 795 a"«£a"a*t£?gt?ta«gt?t« ^ 
eI &%AY III af^l^aa^^^ 
eI eXF^$AY- 971 SgacSetctgtffCCte§fftCatca« 

eXF-J^r 1061 acaagattaagagttaaagaaaccgaacaacaagtggcaaccaatcatcctaacattggaaacactggggcgccacttW 
eJF-lAr 1151 ttaeLatcgEaltccactgtcctggctttcatgaacaag^ 
elF-lAY 1241 tg aaaaaaaaaa 1252 

FIG. 6 
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DFFRX & DFFRY 

DFFRY -1664 gaagtgacatgttggcatgggcccaattctgctggtcctttagt 
DFFRY -1620 atacaaaaaaaataaaggcttaccagtatgtcaccacaCgcagatctatggattgtacagaaaattggcgattcccaaatttcactgtgc 
DFFRY -1530 accaaaataatcgacggaaccttaaagactaaaQatttctagaccccaccccaggcccgatgaCtgagaatatctagaggggacccaaga 
DFFRY -1440 acccatacatttaagtgccccacccacaacaatgacctttaagcaggtagCttgcaCttgggaaccaccgctacaggttactagtgggac 
DFFRY -1350 aaccagccaggagcataagcttgaacattttacagcccgtcacccgcgatagcCcaccaccCgtgaCaCaaccagaaaCccaattaagat 
DFFRY -1260 tgctacctctctgcaatctgrt C tgcaa tttoggtgC taat etc tt tganagt tcagaaaaaagtagacaaaacagaaaagaaatcaagta 
DFFRY -1170 caaccacatzaacgacaaaaaacgcattacactitgcactaaacctcaaaactggagaacaaaggcgcaacataacatgaaaacaatCaaaL 
DFFRY -1080 gctaagcgaaataataccaaatgtagttgaccctgaagaaaacgcagtagcgagggacccctaacctgtgggccctccaggaactacCgc 
DFFRY -990 tgaacggtcCtgagaacccactggaaaagaccaagcattgttacccgaacaactgaacCCCgCttatttctccatatCCCtgcagCggta 
DFFRY '900 acccca ttataaaacctaatgaaacaatgtttttacagacggcgtggaaagacttttctgggctcagaggtgaaactgacccttgtgtat 
DFFRY -810 cagcagcattcctgactgactgagagagcgtagcgactaacagagttgtgatgttagctaagaaacttagattcgccattgtagcttctc 
DFFRY -720 taccaattagcagatcgtctaactcaccgaoattgcaaagtggtauacfftggacttagtcattacCgagcagcttatgaattgtattcat 
DFFRY -630 ttactcatgatgtaaaaatggttagtctccacctttaoggctctagttctagcggctaaataggtacttatttatacagtatgataactg 
DFFRY -540 ctgtactaaaatacatgtctcaaaCgtggaatagtagaagaggCgaagaaaatcatagcttgaggtagaa&actgtttgccggtcttaaa 
DFFRY -450 aactgtggtattttgstgatcccataaattaggtcagtttacttccactggagggaaacagtttaaaggatatatgtgatactattaatag 
DFFRY * -360 aatgaggaagacacaccagatatttaggagggaattagcgagcttgaaactaagagctggtttgaatgagactgggtcataagtgatttc 
DFFRY , -270 aagtaccagaCCaaggcaccgagactttatttttaagcaccgaagccagat tttttcctttcaaaagaaaggattcatgatgaaatctgc 
DFFRY -180 ctttcgccttgcagagagcttggagataattctggtggctgtgtggagtatgtgttggaggtatcaaattttcacagtatatataaggca 

DFFRX -59 c.tttct. . ag.ca.ctac. t. .gc. . .c. • • .tt- cc g... 

DFFRY -90 gcaattgataggcctttcacagnttcttctgataaccacataaagagacaaAoaaaagaaaaaagagcaaagatctgrtgctgtgtcaagc 



DFFRX 
DFFRY 



DFFRX 
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-180 agtgcgcccggcgacacaacctttgaggctgtcacaccaccaagatcatattgtatcactggaccagcataaagccff&cacctctgacca 
-90 tgcccagccttcaoataatactacactgtataattggctcaacacccaggtgatattgttccacccacctgasroccagatajiaaagccta 

1 ATGATGACGCTTGTCCCCAGAGCCAG^ACACGTCGCAGGACAGGATCATrACTCTCATCCCTC 

1 M M T L V P R A R T R AGCDHY.S-HPC P RFSQVLLT 
9 1 GAG<^CATCATGACATATTGCrrTGACAAAGAACCTAAGTGATC 

SlEGI MTYCLTKNLSDVNI LHRLLKNGNVRNT 

181 ari XjClTCAGTCCAAAGTGGgCTTGCTXSACJVTATTATGTGAAACTGT 
61 LLQSKVGLLTYVVKLYPGEVTLLTR.PS I Q M 

271 AGATTATGCTGTATCACTGGCTCAGTGTTOAAGCCCAGATCACAGAAGTAAttgtgCCatata 

91 R Xj C C ITGSVS K P R S Q X • 10fi 

361 catcca&cgtggc&ctgccttcaaagggaaattttacatatgccactgggaccaccacccagacgacgtcctgcccactaaaagaattgc 

451 gacataacgctgactgcaaaaactgggtaatgcaaccctcctctttattccggagtccgccaaaacaagggattatcacatattgcggag 

541 tccagcacccaggtaaaacttcgtcatatacccagcttcagataccatgcaacgacacaactatcatacctggacccaaagaggagagat: 

631 attccgacccccattgccaCtcttatggccacaagcaaagtaaCggccctcacagt;ggtaCaaagl:tcacacagt:at:tatgacaccccca 

721 gcgtatcatagaaaatgtgagtagtacaatgagtgtcataacagggaacagcaaadipaatgctattgcgactactggattcacacccagc 

811 tqacgcgactatcattctctcacaagaacagaacctgca aat^aag tactaaatctcacc aaaaaaaaaa 880 



FIG. 10 



XKRY 



-663 attaaaaaettctgataaaattacctaagtaca 

-630 cacaaacaaaaacatgcccacacaaatcactcaatttctaaaacttttaatttttctgcttccctagtacctcgtattccaccacacagc 

-540 aaaatctggcagctccacttccagaatttacttgaactccacagcttatttccgatttcctgttatcaccagagtctaaaacacagCCta 

-450 cattgcattcacctcctatcttacaccgtaatttcccactttacactctaactttatataaaaaagaaactaccttctcaagatctaatt 

-360 cacgcaa t ttta tttgcccc taa ttgagaettcfcttctaggtgctgtcacaccttgtaacgtcagatacaaatgtcccta tccaa t ttca 

-270 tgagttccagccattttatttcoagggaatgtgtatacacatttataaatttgtgtatgcgtgtactcacttattctttattctatatgt 

-180 tttgcatgcatatattcactaaatccctgataatagaaagataacaaatctttctttttctttetttcttgtatBtaaattattttccga 

-90 aggaggtgggctgggagaaatatatcttaacttggcaagtctaaaagagaaogtggccattactaatgaaaattactctctagcatctcc 

1 ATGTTTATCTTTAATAG<^TTGCTGATG A CATATTCCCTCTTATCAGTTGTGTAGGTGC CATTCACTG CAATATACTGGCCATCCGCACT 
IMF I F* N S IADD IFPIiISCVGAIHCNILAIRT 

9 1 GGCAACGACTTTGC1X3CCATTAAGCTACAGGTGATAAAA 

51GNDFAAIKLQV1KLIYLMIWHSLVIISPVV 

181 ACTCTGGCATTCTTCCCTCCJITCTCTGAAACAGCGC 
61TLAFFPASLKQGSLHFLLIIYFVLLLTPWL 

271 GAGTTTTCGAAAAGTGGAACTCATCTTCCTAGCAACACAAA 
91EFSKSGTHLPSNTKIIPAW- WVSMDAYLNHA 

3 61 AGTATAl^CT ^ CCATCAA l *l£TCCTGC'TlX3TCAGCACTCAAACTC 

121 SI CCHQ FSCLSAVKLQLSH^EELIRDTRWDI 

4 51 CAATCCTACACTACAGATTTCAGTTTTTAGaaaatgcgataataatattgacat^tiagttcctctggagggBacgttttaccgaagtgct 
151 Q S Y T T D F S F • 159 V.i 

541 gtgactcaataattgccgtgtagttcatcaaaacctacatattagcctttggcctcaagctccgcttctgtcagtatttgcaaccaaggt 

631 ggtcgggcaaagtaetgccaggagatactgaaaatcatccagaagcaccgcgatattgtgtaagcatctggagaaaattcagttaaaaga 

721 acaaaagtaagcagccgaggaactactatcactcatggagaagggcaggatattctcaataagtgagtacgcaatacccatatatacttt 

811 cacagaacaaagagtaaagaggctgagtgtgacttcataaagatactcatgaaaaatataaacaacaaaaccttggaagtagtttct aat 

901 aaaa ccgatttttct aaaaaaaaaa 925 



FIG. 11 
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PTPRY 

-162 aa 
-180 oaagaggagcacaccacaccagaaacagocatcctgcagcgcttcactfftctcaaccctatctgcacagtccgaggtcagtctgagaoag 
-90 cctctgagagacccaggatgaagggacgcagtgaggtcaagagcccaaccctccttcactgacacccacccctaaggactcagaagagac 

1 ATCAATAJU^TGGGCCTCAACAATC^ 

1MNKMGLNN PKKNH SRTMGATGLG FLItPWK Q 
9 1 GACAATTOAATGGCACTGACTCCCAGGGATC^ 

jl DNLNGTDCQGCNI LVFSETTGSMCSELSLN 

181 AGAGGTCTTGAGGCCAGAAGGAAGAAGGATCTTAAAGACTCATTTCTCTGGM 
61RGLEARRKKDLKDSFLWRYGKVGCISLPLR 

271 GAGATGACCGCCTGGATTAACCCACCCCAAATTTCAGAGATTTTCCAAGGCTACCACCAGA 
91EMTAWINPPQ X S E I F Q . G Y H QRV H GA.DA L S L 

3 61 CAAACCAACTCTCTGAGAAGCAGGTTATCTTCACAGTGCCTC 

121 Q TNS L R S RLS S QC h G Q S FLLRTL E R A V V S * _Q 
451 CACTTGGGGACATCTGTGGCCACGTTCATGAAGAAGACTAAGCCTACTTCATCTCA 

151 HLGTSVATFMKKTKPTSSQDPPKSGRGFGT 
541 CCTGCGGTCGGGTCCACCATGAGGATAAAACCTCCTTCTCTTC^ 

181 PAVGSTMRIKPPSLLDMSRSGRCYKSPGAT 
631 ACCAGGGTGAGAATAAAGACGTCTCCTCAGGACCCTCCCAOT 

211 TRVRIKT SPQDPPRRVHG. IETSGGQVRKRH 
721 CCTCTCTCCA^CACCCAGAAX^^ 

811 cctcgagcgaggcagtgaccacgcactgtcacagctaccaaagtgtggtctgcagatgacctgggcttgtttctggcagagactctggta 
901 cagagaaaggagaggcgttgagtggaaccacgatgggccgaggccaggggagacaecacaacccccaacaacactttttttcatgcttta 
991 ataaa tcattttccttagagaactaaagtagttgaaacaatatagaaacattttttaaqtaggcat aaaaaaaaaa 1066 

FIG. 12 

TT Y1 

tgrctgtcagagctgtcagcctgcttaagcagagtaaaatggcacaggcagtgcagcctggtagcgagaaaaaaggctgcctgtgaaatc 
ccactgcgggaccataagtggggacctcagggcccctccatggcatccccacggccacgccacgccggagaaggaggcgctt^aagaatg 
Cgagccgatcgccggaaactgctcacccgaccccagcctcaaaagaggccatgtgcaagaaccgggtgaagttgtgagacpccatccacc 
ccccacaagaccgcatccecaccct:gcct:gaccctaccgctgcccaaactaccCgt:ccaaggacgaaaacccaggacaaaggaggagt:aa 
cccticacgatgcgaagcacgcgctcacctgcgaacataacccgaggaccatgagac&atictgcggattccacagagaagacagacgagaa 
gacaccgctigacactztctccacggaggtcccccttcccaccaagacgcagacgcctc&cgcaaggactaccccgcgaatcccacagagaa 
gacaggtgtggttccaatgccggcgcacctccagggaattcccctcctctaccaagctccaggccctctgccacgatcatgagactactt 
gtggatttcaeagagaagacaggcgaaggtacagcacggcatccacccctcaccagaggggtatccccacccctaeccgaccceaetacc 
ttattgctgttcaaagcctccaccccagactgaaatcccaagacaatggagaagtcccccctgatgatgtgaagcaccaactcctctggg 
aaccaaacccgaggtaaatctaataggcccggcagagatgaatgatagtgtctctccttggattggctgaaagacaactaaacactggta 
tatttccgtc aaaaaaaaaa 

FIG. 13 

TTY2 

aggcttgccatcaccacagatggcctctgagacactgtttgaaccacatctgcacctgtgagaggccagtttgaggtatgagaacactgt: 

ttcaacttggacttgcctttgccctggttcctgctttccccagacggcacccacccaacccaggacgaatgagcgcagagaggccaagtg 

ccaggccatcttccgctgacacccttctccggtatttcaggtatangtccatcatccaaagactgctcaacacctcaccagaatatattt 

caaccctcatggggcacgacccnctcacaaaacccctttcaggaatggagtcagaagagtagtttccagagacaacctcacagtcttgga 

acggctctgcctcccatgcgatctgaccatggagatggcatataagggccctaagtttgagacctctagggtactgcaatgcgttatcac 

aggcagcctttatcccgataccaagccagctctgcctgtaccactttcccccgcttaggcaggctgacagccctgacaccctggtgcCcc 

agcttgagtcaccatracgcggacgtgccagtcctggggcaatggacccgagccgcgagccgtagctagcgccacaatgaacgccagcect 

gccagtaacaaccccccticggcttggcagagaaggggacctccgtiggaggtacaatiggCggtgcaccgtcacccgtctcctccgtgggat 

ccacgggacagccccatgatcctaggagagggtagafcgtgagccagccegaagaaatgtcaagcagagccccaggaatgaagcacaaaat 

caccacagatccaaaaggacctgcagaattcgtcaggcccgcccagacattgcaggggttagtcttatcgaaatgtgtcccactgtaatt 

tccaacttcagcctccctgcgtccccagcagtttctccctcccaggtggggccttctgcagaatgacacagcctcagaagctactgggct 

gtgcgctactgtgggagtgttgcgagtgccggatgtcagcatgtgtgtgtggccttgtgtgtgtgtgtaggcgtctgCgcgtgtgtatfft 

aaacgaatecegcggatcaggaatcagcaatgactiagttaagctgcccgtgaccagccgggctccccatcgcctgcccctgccaaaaaaa 

caggtactcttctacaaagaagaggagagcaccacacccaagaacagacatctcccagtgttgcattataaagcagccaacccacagaca 

ctagcactctggcctgcatagcccctttaatttacctagaactcagttcccagccaagtaggtgcttcacgtcctgagggtgcaatcctc 

catcaecttgagatttcatgctggtacagagagtgtgacagcaataaggtcagataggggtgagtatacaacctggtgaagggtggatgg 

ggccccgtaccttcaccagcaaaaagggtgaaaatagatgacacagaacgtgcttccaactccatccccacattcccataaccgcaaaac 

cagccaacaacatggcctggtgttcaggtgggagtactccaacctgcaggaagaatttggagtgcaaattgtggccaacctggaaaactc 

ctggtttgagggttttaatacctgcagtcaaatggaagtggaatagactgatgctgggtgggttgtggcctccacatccgtgtcctctct 

taccgacttccattgtcctcattggtgtagggctccctggacctggcccaacaccctccacactaaactcttccccgctcacagaagacc 

atcataaaaatgcattgcagaggccctgcaaggaccaggaegaagggagacagtgaggtcaagagcccagccatcttitcactgacaccca 

ctcctgggttc'tcaggctggccgacaggcccgacagcccatcacgaaagcetgcatactcttagacacaaggactgagccatgggctcca 

gccagcaccacaacgaaggccaccattgcctagggataagtccctgtgactttgtggataagaaccccgtggagccaacccaaggagaga 

caccatcattcacccctaccgggcctattaaacccacctcgaatttgacccccagaggagttggtgctccacatcaccagggggaacctc 

tccaccgtcccgggacrtcagtctgggacagagactccgaacagcaacaagcttccctggtccggcccaacgccttctaaacccaacatc 

ccccagttcatggaaaacgaccctcatgggactctattgcaaaagctgcacgattcccggagaacgcctcattgtcccggagtcaaccca 

aaaaccaacactaccaytcatgttgcagagctccctgaaccaaccccgaactcagtctccagctgagcagctgccccacgttgtgagggg 

gcaatcctccatcatcctgggacttcattctgggacacagagtttgagcagcaataaggetgggtccacactgcccctcaacagcattag 

tggacacgattgccagacctgcaatttccgcagacaccttctgtgaacatttttcaacaccatecacacgagcgagagaccegctcgaca 

tgtaagaacaccgcctgacctcggacccgcccttgccgtggtccccgcctctctc^ti^gaccccccgccaggcccaggacgataggaggc 

aacgaagccaagggccgagccccattcattgaaagccgactccggggccccgggcataatcccatcacacaaaatccccccaacaaccca 

ccagactacactccaacctccacggaacccgatccttgcacacagcctcttccaggMtggagccagaagageagtcttcagagaccacc 

tcagtttggaaaagcctccCcctccagtggctcccagccacggagtcatcgcgaaggggctccacggccaacaattttacggcactgcac 

ttggttatcacagacagaccccttcacgacagcatgccatccctgcctataccatttfccccccgcccaggcaggctgacaaccctgacag 

ccaggggcccgaatctacctggcaaacgtgcacgctccactetcagcgcaaaaggcctgtttgggagccctgactagtgtcac aataaac 

gccgccattgcctagCg aaaaaaaaaa " 

FIG. U 
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Human CDYL (CDYLike) 



agagagaggacctatttctacctaaggacattcccggaaggcaatgggtttcaaacaatat 
cctgaagagactcatctcggggaactaagcaggtggtaatcagagaacacagagcccccgg 
aagaattttatggcatttcaggcaagccacaggccagcctggggaaaaagcaggaagaaaa 
actogcaatacgagggcccaacccaaaagttattcctgaagagaaacaacgtgrcagcacc 
agatgggccttcagaccccagcatctccgcgagcagtgagcaaagcggggcacagcagcct 
cccggtttacaggttgaaaggattgttgacaaaaggaaaaataaaaaagggaagacagagt 
atttggttcggtggaaaggctatgacagcgaggacgacacttgggagccggaacagcacct 
catgaactgtgaggaatacatccacgacttcaacagacgccacacggagaagcagaaggag 
agcacattgaccagaacaaacaggacctctcccaacaatgctaggaaacaaatctccagat 
ccaccaacagcaacttttctaagacctctcctaaggcactcgtgattgggaaagaccacga 
atccaaaaacagccagctgtttgctgccagccagaagttcaggaagaacacagctccatct 
ctctccagccagaagaacatggacctagcgaagtcaggtatcaagatccxcgtgcctaaaa 
gccccgttaagagcaggaccgcagtggacggctttcagagcgagagccctgagaaactgga 
ccccgtcgagcagggtcaggaggacacagtggcacccgaagtggcagcggaaaagccggtc 
ggagctttattoggccccggtgccgagagggccaggatggggagcaggcccaggatacacc 
cactagtgcctcaggtgcccggccctgtgactgcagccatggccacaggcttagctgttaa 
coggaaaggtacatctccgttcatggatgcattaacagccaatgggacaaccaacatacag 
acatctgttacaggagtgactgccagcaaaaggaaatttattgacgacagaagagaccagc 
cttttgacaagcgattgcgtttcagcgtgaggcaaacagaaagtgcctacagatacagaga 
tattgtggtcaggaagcaggatggcttcacccacatcttgttatccacaaagtcctcagag 
aataactcactaaatccagaggtaatgagagaagtccagagtgctctgagcacggccgctg 
ccgatgacagcaagctggtactgctcagcgccgttggcagcgtcttctgttgtggacttga 
ctttatttattttatacgacgtctgacagatgacaggaaaagagaaagcactaaaatggca 
gaagctatcagaaacttcgtgaatactttcattcaatttaagaagcccattattgtagcag 
tcaatggcccagccattggtctaggagcatctatattgcctctttgcgatgtggtttgggc 
taatgaaaaggcttggtttcaaacaccctataccaccttcggacagagtccagacggctgt 
tctaccgttatgtttcccaagataatgggaggagcatctgcaaacgagatgctgctcagtg 
gacggaagctaacagcgcaggaggcgtgtggcaagggcctggtctcccaggtgtuttggcc 
cgggacgttcactcaggaagtgatggttcgcattaaggagcttgcctcgtgcaatccagtt 
gtgcttgaggaatccaaagccctcgtgcgctgcaacatgaagatggagctggagcaggcca 
acgagagggagtgtgaggtgctgaagaaaatctggggctcggcccaggggatggactccat 
gttaaagtacttgcagaggaagatcgatgagttctgagtgtcgggctgcccactggtgaca 
ccgggatcgggctgagcaggagaacatcaccggctccagttcccctgatccactctcacag 
ccEgaaacaagctcacccgtagcttacgcttggaagcaggactgggaacatccacgctatt 
tattatcgaggagttttaaagtactgtaactttaaaataaataactacaaagcttctttgt 
cvaaacgtcattattttatacttatatacacgcaggtgtaaaagtataaaggtgagcacta 
gactgctcttagaagctctaatttttgttttctttggctagtactgtataaaaaacagaat 
tgtgttttattggttttggatgacagaaaagtctggaataatgtttgttttcctcatttct 
tccttctaaaacacagaatctaagggggtgttagccagcctcgcctccctgccccacgtag 
agacacagagtgatgtgaggcgttggctttttctccaagaaggtacagatacctcagattc 



gggaaactcaaaatcaaaagacttagcttctaggataaatacttctgatgaaaaatccgct 




gaagaaagcttgttttgcagtattagtgaatcactgaatagcttaagtatgactatctaag 
IJalaagttagtctttagtgggttttaaatagtttttctgacccttctgaaaaataactac 
ataagtgcttcttgttgctgggtgagaaatactactttatagacagttttggttutctgtt 
tgcagatataattgatgtatttcaccaaaataaaatatttttatgtttat^aagugraatt 
tttaggttcacttagaatatattttatttaataagctaaaattcttttggcacactattaa 

atgcaaaaactcctttc 

FIG. 15 
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Mouse Cdyl(CDYlike) 



ctttgaggtggtttagcatcccacttgttccttgaggacatctgttcctacctaagagcac 

tcacctgagatgctcaaaggtccagaagaaacacttctcgggtgacaaagcaggtggtgac 

cagagaacagaggccccccaaaaattttatggcattcaaggcaaagcacagccaacccgga 

gggaaagcaagagtccagcctggaaatacatagcccaacccgaaggttatctctgaaggaa 

aacaatgggcataggcaatagccagcctaattcacaggaagcccagctctgcacacttcca 

gagaaagctgaacaacctactgatgataacacctgccagcaaaataatgtggttcctgcaa 

cagtctcagaacccgatcaagcgtcccctgcaattcaagacgcggagactcaggtggaaag 

tatcgttgacaaaaggaaaaacaagaaagggaagacagaatatctggtgcggtggaaaggc 

tatgacagtgaggatgacacgtgggagcctgagcagcacctggtgaactgtgaggaataca 

tccatgacttcaaccggcgccacaacgagaggcaaaaggaaggtagcctggctcgtgccag 

cagagcctcccccagcaacgcccggaagcagatttccaggtccacccacagcactctctcc 

aagaccaactccaaagcacttgtggtaggcaaagatcatgagtccaaaagcagccagctgt 

tggctgccagccagaagttcaggaaaaacccagccccatctcttgcaaaccgcaagaacat 

ggacctcgccaagtcagggatcaaaattctcgtgcctaagagccccgttaagggcaggacc 

tcggttgatggctttcagggggagagccccgagaagctggaccctgtggatcagggtgccg 

aggacactgtagccccagaggtgactgcagagaagcccactggggctttgctgggccctgg 

tgcggagcgagccaggatggggagcaggccccgaatacatccactagtgcctcaggtttct 

ggccccgtgactgctgccatggccacaggcttagctgttaatggaaaaggtacatctccat 

tcatggatgcgctagcagccaacggaacagtcaccatacagacatccgtaacaggagtgac 

agccgggaaaaggaaatttattgacgacagaagagaccaaccttttgacaagcggttgcgt 

ttcagtgtgaggcagacagagagtgcctacagatacagagatattgtcgtcaggaagcaag 

atggcttcacccacatcttgttatccacaaaatcgtcagagaataactcactaaacccaga 

ggtgatgaaagaagtrcagagcgccctgagcacagctgcagccgacgacagcaagctggtt 

ctgctcagcgccgtgggcagcgtcttctgctgtggtctggactttatttattttattcggc 

gcctcacagatgaccgaaagagagaaagcactaaaatggcagacgctatcagaaacttcgt 

gaatactttcattcagtttaagaagcctattattgtagctgttaatggcccagccattgga 

ctaggagcatccatattgcctctttgtgatgtggtttgggctaacgaaaaggcttggtttc 

aaacaccctataccaccttcggacagagtccagatggctgctctaccgttatgtttcccaa 

gattatgggaggagcatctgcgaatgaaatgctgttcagtgggcggaagttgacggcacag 

gaggcctgtggcaagggtctggtctcccaggtgttttggccaggaaccttcacacaggaag 

tcatggttcgaatcaaggagctggcttcatgtaacccagttgtcctggaggaatccaaagc 

cctggtgcgctgcaatatgaagatggagctagagcaggccaatgagagagaatgtgaagtg 

ctgaagaagatctggggctccgcccagggcatggactccatgttaaagtacttacagagga 

aaatcgatgagttctgatgggcaggctgagcaggacatcggtggctcccacttgctacgtc 

gtcctgcagtggctcgtgcttggaggcagaactggaaacatccgagctatttattgccgcg 

gagtttttaagtactgtaactttaaaataaatacaaagcttctttgtctaagcgtctttat 

tttatactcatgtatacacaagtataaaaatgtaattgagcactaggctgctcttggaagc 

tctaattttcttgtaagctagttgtggatttttgttttgtttttgtttttaaaaggaatta 

tgttttcattttgggtgacagaagagtttgaaataatgtttgttttactcttttttttttt 

ccttaaatctagatcacagaccctcaaaattactagccagccttctccccctccctctact 

gaaacatgtagaaatacttaaacatgttcctgcctctaggggggagggggaggtgtgagta 

cctcaatgctgaaaacagttctgatcaaacttaagaccaacctggtaaaaaaagcatcact 

gatggaaaatcccacccacgggggcgtgggtttctgctgaaatgcccgccgctctaccttt 

cttactgtcccattcttacccagccaccgtgaagagcccagtgtctggaggaaagcaggtg 

gtccagtgtctgtgagtcactccgtagctcgagtgttacttgctaagttatgaattagcat 

tagtgggtttaaatagtttttctgaccctttttgaaaaataactacataagtactccttgt 

ggctgggtgagaaatactactttgcatagttttgtttgtctatctgcagatatgattgctg 

tattacaccaaaagtattttttatgtttataaagtgtaatttttaggttcacttagaatat 

attttatttaatttaaaattctcttggcacactattaaatacgcaaactcctttc 

FIG, 16 
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Human VCP (Variably Charged Protein) family 



VCP2r (VCP with 2 repeats) 

gttgcgagacgttgagctgcggaagatgagtccaaagccgagagcctcgggacctccggcc 
aaggccacggaggcaggaaagaggaagtcctcctctcagccgagccccagtgacccgaaga 
agaagactaccaaggtggccgagaagggaaaagcagttcgtagagggagacgcgggaagaa 
aggggctgcgacaaagatggcggccgtgacggcacctgaggcggagagcgggccagcggca 
cccggccccagcgaccagcccagccaggagctccctcagcacgagctgccgccggaggagc 
cagtgagcgaggggacccagcacgaccccccgagtcaggaggccgagctggaggaaccact 
gagtcaggagagcgaggtggaagaaccactgactgtgtggatggccagcttttcccctgtc 
tccgagagcagcgactaagttcaggcccagccgccagacctcagagatctcaccagcgggg 
tgcttgccattctgaagataataaaatgaatgtgttgcaaattgaaaaaaaaaa 



VCP8r (VCP with 8 repeats) 

cggaagatgagtccaaagccgagagcctcgggacctccggccaaggccacggaggcaggaa 

agaggaagtcctcctctcagccgagccccagtgacccgaagaagaagactaccaaggtggc 

caagaagggaaaagcagttcgtagagggagacgcgggaagaaaggggctgcgacaaagatg 

gcggccgtgacggcacctgaggcggagagcgggccagcggcacccggccccagcgaccagc 

ccagccaggagctccctcagcacgagctgccgccggaggagccagtgagcgaggggaccca 

gcacgaccccctgagtcaggaggccgagctggaggaaccactgagtcaggagagcgaggtg 

gaagaaccactgagtcaggagagccaggtggaggaaccactgagtcaggagagcgaggtgg 

aggaaccgctgagtcaggagagccaggtggaagaaccactgagtcaggagagcgaggtgga 

ggaaccactgagtcaggagagccaggtggaggaaccactgagtcaggagagcgagatggaa 

gaactaccgagtgtgtagacggccagctactcccctatctccgagagcagcgactaagttc 

aggcccagccgccagacctcagagatctcaccagcggggtgcttgccattctgaagataat 

aaaatgaatgtgttgcaaattgaaaaaaaaaa 



VCPlOr (VCP with 10 repeats) 

cgttgcgagacgttgagctgcggaag^tgagtccaaagccgagagcctcgggacctccggc 

caaggccacggaggcaggaaagaggaagtcctcctctcagccgagccccagtgacccgaag 

aagaagactaccaaggtggccaagaagggaaaagcagttcgtagagggagacgcgggaaga 

aaggggctgcgacaaagatggcggccgtgacggcacctgaggcggagagcgggccagcggc 

acccggccccagcgaccagcccagccaggagctccctcagcacgagctgccgccggaggag 

ccagtgagcgaggggacccagcacgaccccctgagtcaggaggccgagctggaggaaccac 

tgagtcaggagagcgaggtggaagaaccactgagtcaggagagccaggtggaggaaccact 

gagtcaggagagcgaggtggaagaaccactgagtcaggagagccaggtggaggaaccactg 

agtcaggagagcgaggtggaggaaccactgagtcaggagagccaggtggaggaaccactga 

atcacrgagagcgagatggaagaaccactgagtcaggagagccaggtggaggaaccaccgag 

tcaggagagcgagatggaagaactaccgagtgtgtagacggccaagtactcccctatctcc 

gagagcagcgactaagttcaggcccagccgccagacctcagagatctcaccagcggggtgc 

ttgccattctgaagataataaaatgaatgtgttgcaaattgaaaaaaaaaa 
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GOVERNMENT SUPPORT 

The invention described herein was made in whole or in 
part with government support under Grant Number HG0 02 57 
awarded by the National Institutes of Health. The United 
States Government has certain rights in the invention. 

RELATED APPLICATIONS 

This application claims the benefit of U.S. 
Provisional Application No. 60/041,877, filed April 11, 
1997, entitled "Genes in the Non-Recombining Region of the 
Y Chromosome" by Bruce T. Lahn and David C. Page. The 
entire teachings of the above referenced application is 
expressly incorporated herein by reference. 
BACKGROUND OF THE INVENTION 

The human Y chromosome is distinguished from all other 
nuclear chromosomes by four characteristics: the absence of 
recombination, its presence in males only, its common 
ancestry and persistent meiotic relationship with the X 
chromosome, and the tendency of its genes to degenerate 
during evolution (J. J. Bull, Evolution of Sex Determining 
Mechanisms (Benjamin Cummings , Menlo Park, CA, 1983); J. A. 
Graves, Annu. Rev. Genet. 30:233 (1996); B. Charlesworth, 
Curr. Biol. S;149 (1996); W. R. Rice, Bioscience, 46, 331 
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(1996)). To be precise, these distinctive characteristics 
apply only to the non-recombining portion or region of the 
Y chromosome (NRY) , which comprises 95% of the human Y 
chromosome. The remaining 5% of the chromosome is composed 
5 of two pseudoautosomal regions that maintain sequence 

identity with the X chromosome by meiotic recombination (H. 
J. Cooke et al . , Nature 317:687 (1985); M. C. Simmler et 
al., Nature 317:692 (1985); D. Freije et al . , Science 
258:1784 (1992); G. A. Rappold, Hum. Genet. 52:315 (1993)). 
10 Given the NRY's peculiar characteristics, one might expect 
its gene content to be idiosyncratic. Since discovery of 
the Y chromosome in 1923, its gene content has been the 
subject of speculation. By the middle of this century, 
while studies of human pedigrees had identified many traits 
15 exhibiting autosomal or X- linked inheritance, no convincing 
cases of Y- linked inheritance could be found (T. S. 
Painter, J. Exp. Zool . (1923); C. Stern, Am. J. Hum. Genet. 
5:147 (1957)). As a result, consensus began to emerge that 
the Y chromosome carried few, if any, genes. In 1959, 
2 0 reports of XO females and XXY males established the 
existence of a sex- determining gene on the human Y 
chromosome (P. A. Jacobs et al . Nature 183:302 (1959); C. 
E. Ford et al., Lancet, i:711 (1959)), but this was 
perceived as a special case on a generally desolate 
2 5 chromosome. Opinions began to change only during the past 
decade, when eight NRY transcription units (or families of 
closely related transcription units) were identified, most 
during regionally focused, positional cloning experiments 
(D. C. Page et al . , Cell 51:1091 (1987); A. H. Sinclair et 
30 al., Nature 345:240-244 (1990); J. Arnemann et al . , 

Genomics 11: 108 (1991); E. C. Salido et al . , Am. J. Hum. 
Genet. 50:303 (1992); E. M. Fisher et al . , Cell 63:1205 
(1990); K. Ma et al . , Cell 75:1287 (1993); A. I. Agulnik et 
al., Hum. Mol. Genet. 3:879 (1994); R. Reijo et al . , Nat. 
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Genet. 10:383 (1995)). It was not known if there were more 
genes in the NRY. 

SUMMARY OF THE INVENTION 

A systematic search of the non-recombining region of 
the human Y chromosome (NRY) has identified 12 novel genes 
or gene families. All 12 novel genes, and six of eight NRY 
genes or families previously isolated by less systematic 
means, fall into two classes. The first class of genes 
exists in one copy and is expressed in many organs; they 
have functional X homologs that escape X inactivation, as 
predicted for genes involved in Turner (XO) syndrome. The 
second class consists of Y- chromosomal gene families 
expressed specifically in testes, and may account for 
infertility among men with Y deletions. 

The genes described herein, portions of the genes and 
DNA which hybridizes to genes or gene portions described 
are useful in diagnostic methods, such as a method to 
identify individuals in whom all or a portion of a gene or 
genes of the NRY is missing or altered. For example, Y 
chromosomal DNA from males with a known condition, such as 
infertility or reduced sperm count, can be assessed, using 
the gene(s) described herein, or characteristic portions 
thereof, to determine whether their DNA lacks some or all 
of the gene(s) described herein or contains an altered 
gene(s) (e.g., a gene in which there is a deletion, 
substitution, addition or mutation, compared to the 
sequences presented herein). Y chromosomal DNA (e.g., from 
a male with reduced sperm count or viability) can be 
assessed, using DNA described herein or DNA which 
hybridizes to DNA described herein, to determine whether 
the condition is associated with or caused by the 
occurrence of the gene or the gene alteration. For 
example, the presence or absence of all or a portion of a 
gene or genes shown to be necessary for fertility or 
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adequate sperm count can be assessed, using DNA which 
hybridizes to the gene or genes of interest to determine 
the basis for their infertility or reduced sperm count. In 
one embodiment, the occurrence of one or more Y- specific 
genes or a characteristic portion of one or more Y-specific 
genes is assessed in Y chromosomal DNA. In another 
embodiment, deletion or alteration of one of the testis- 
specific (Y-specific) genes described is assessed, such as 
by a hybridization method in which DNA which hybridizes to 
one of the Y-specific genes described herein or a 
characteristic portion thereof is used to assess a DNA 
sample obtained from a male who has a reduced sperm count. 
Lack of hybridization of the Y-specific DNA used to DNA in 
the sample indicates that the gene is not present in sample 
DNA or is present in an altered form which does not 
hybridize to Y-specific DNA of the present invention. In 
another embodiment, an X-homologous gene or genes present 
on the NRY can be used to determine whether the gene is 
present in an individual or if it occurs in an altered form 
in the individual. Using known methods, such as 
hybridization methods, X or Y chromosomal DNA from an 
individual can be assessed for the presence or absence of 
one or more of the X-homologous genes or a characteristic 
portion of one or more X-homologous genes . X or Y 
chromosomal DNA can also be assessed for the presence or 
absence of an altered form of one or more of the X- 
homologous genes described. In the present methods, DNA 
can be analyzed for the occurrence of Y-specific DNA, X- 
homologous genes or both. For example, a "battery" or 
group of DNA probes (sequences) can be used to analyze 
sample DNA; the probes can include Y-specific DNA probes 
(e.g., DNA which hybridizes to a Y-specific gene), X- 
homologous gene probes (e.g., DNA which hybridizes to an X- 
homologous gene) or both types of probes. DNA described 
herein is also useful as primers in an amplification 
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method, such as PCR, useful for identifying and amplifying 
Y-specific DNA or X-homologous genes in a sample (e.g., Y 
chromosomal DNA) . Further, proteins or peptides encoded by 
the DNA described herein, such as proteins or peptides 
encoded by an X-homologous gene or proteins or peptides 
encoded by testis-specif ic DNA (a testis-specif ic gene) , 
can be assessed in samples. This can be carried out, for 
example, using antibodies which recognize proteins or 
peptides of the present invention (proteins or peptides 
encoded by DNA described herein) . 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 is a gene map of the non-recombining region 
of the Y chromosome. 

Figure 2 shows the amino acid sequence alignments of 
the chromodomain (SEQ ID NO.: 1-6) and putative catalytic 
domain (SEQ ID NO. : 7-12) of human CDY genes with their 
respective homologs . Amino acid identities are indicated 
by black shading and for each protein, the first and last 
amino acid residues are numbered (with respect to the 
initiator methionine) and the total length of the protein 
is indicated. Chromodomain: SEQ ID NO. : 1, CDY (human) ; 
SEQ ID NO.: 2, HP1 (Drosophila) ; SEQ ID NO.: 3, Polycomb 

(Drosophila) ; SEQ ID NO.: 4, CHD1 (Drosophila); SEQ ID NO.: 
5, Su(var) 3-9 (Drosophila; SEQ ID NO.: 6, PDD1 

(Tetrahymena) ; SEQ ID NO.: 7; Covalent modification domain: 
SEQ ID NO.: 8, CDY (human); SEQ ID NO. : 9, Enoyl-CoA 
Hydratase (Human); SEQ ID NO.: 10, 4-CBA-CoA dehalogenase 

(Arthrobacter) ; SEQ ID NO.: 11, Crotonase (C. 
acetobutylicum) ; SEQ ID NO. : 12, Naphthoate synthase (E. 
coli) . 

Figures 3A and 3B are the nucleic acid sequence of DBX 
(long and short transcripts, SEQ ID NO: 13 and SEQ ID NO: 
14, respectively) and the encoded amino acid sequences (SEQ 
ID NO: 15 and SEQ ID NO.: 16, respectively), DBY (SEQ ID 
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NO: 17) and the encoded amino acid sequence (SEQ ID NO: 
18) . Dots in the DBX DNA and protein sequences indicate 
that the nucleic acids or amino acid residues are the same 
as those represented for DBY; dashes indicate a missing 
5 nucleic acid or amino acid residue. 

Figures 4A and 4B present the nucleic acid sequences 
for three forms of TPRY (short, medium and long, SEQ ID NO: 
19, SEQ ID NO: 20 and SEQ ID NO: 21, respectively) and the 
encoded amino acid sequences for the short, medium and long 
10 forms (SEQ ID NO: 22, SEQ ID NO.: 23 and SEQ ID NO: 24, 
respectively) . 

Figure 5 presents the nucleic acid sequences of TB4X 
(SEQ ID NO: 25) and TB4Y (SEQ ID NO: 26) and the encoded 
amino acid sequences (SEQ ID NO: 27 and SEQ ID NO: 28, 
15 respectively) . Dots in the TB4X DNA and protein sequences 
indicate that the nucleic acids or amino acid residues are 
the same as those represented for TB4Y . 

Figure 6 represents the nucleic acid sequences of 
EIF1AX (SEQ ID NO: 29) and EIF1AY (SEQ ID NO: 30) and the 
2 0 encoded amino acid sequences (SEQ ID NO: 31 and SEQ ID NO: 
32, respectively) . 

Figures 7A - 7D represent the nucleic acid sequences 
of DFFRX (SEQ ID NO: 33) and DFFRY (SEQ ID NO: 34) and the 
encoded amino acid sequences (SEQ ID NO : 3 5 and SEQ ID NO: 
25 36, respectively). 

Figure 8 represents the nucleic acid sequences of CDYa 
(SEQ ID NO: 3 7) and CDYb (SEQ ID NO: 38) and the encoded 
amino acid sequences (SEQ ID NO: 3 9 and SEQ ID NO: 40, 
respectively) . 

30 Figure 9 represents the nucleic acid sequences of BPY1 

(SEQ ID NO: 41) and the encoded amino acid sequence (SEQ ID 
NO: 42) . 

Figure 10 represents the nucleic acid sequence of BPY2 
(SEQ ID NO: 43) and the encoded amino acid sequence (SEQ ID 
35 NO: 44) . 
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Figure 11 represents the nucleic acid sequences of 
XKRY (SEQ ID NO: 45) and the encoded amino acid sequence 
(SEQ ID NO: 46) . 

Figure 12 represents the nucleic acid sequences of 
PTPRY (SEQ ID NO: 47) and the encoded amino acid sequence 
(SEQ ID NO: 48) . 

Figure 13 is the nucleic acid sequence of TTY1 (SEQ ID 

NO: 49) . 

Figure 14 is the nucleic acid sequence of TTY2 (SEQ ID 
NO: 50) . 

Figure 15 shows the nucleic acid sequence of the human 
CDY Like (CDYL) gene, which is the human autosomal homolog 
of CDY, located on chromosome 6p and expressed 
ubiquitously . 

Figure 16 shows the nucleic acid sequence of the mouse 
Cdyl (CDY like) gene, which is the mouse ortholog of human 
CDYL, located on chromosome 13 and expressed predominantly 
in the testis. A longer transcript of the gene is 
ubiquitously expressed. 

Figures 17A - 17C show the nucleic acid sequences of 
human Variably Charged Protein family members VCP2r, VCP8r 
and VCPlOr, which are expressed in the testis and highly 
polymorphic . 

Figure 17A is the nucleic acid sequence of VCP2r. 
Figure 17B is the nucleic acid sequence of VCP8r. 
Figure 17C is the nucleic acid sequence of VCPlOr. 

DETAILED DESCRIPTION OF THE INVENTION 

Y chromosome genes, classed as genes having X 
homologues and testis-specif ic (Y-specific) genes, are the 
subject of the invention described herein, as are DNA which 
hybridize to (are complementary to) all or characteristic 
portions of the Y chromosome genes, the encoded products 
(e.g., proteins, peptides, glycoproteins), antibodies and 
methods of diagnosis or treatment in which the genes, 
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complementary DNA, encoded proteins or antibodies are used. 
As described herein, fragments that hybridized to Y 
chromosomal DNA were selected and then their nucleotide 
sequences determined. It was expected that these sequence 
fragments would represent a redundant sampling of a much 
smaller set of genes. Computer analysis revealed that 577 
fragments corresponded to known Y genes, including seven of 
eight NRY genes and all eight pseudoautosomal genes 
previously reported. These findings suggested that the 
253 9 sequence fragments represented the great majority of 
all Y-chromosomal genes. After further analysis, both to 
eliminate human repetitive sequences and to assemble 
overlapping fragments into contigs, 912 novel and 
non-overlapping sequences were hybridized to Southern blots 
15 of human genomic DNAs . 308 sequences that detected at 
least one prominent male- specif ic fragment were judged 
likely to derive from the NRY, and for each work was 
carried out to isolate cDNA clones from a human testis 
library, as described in Example 1. Nucleotide sequencing 
of cDNA clones, and rescreening of libraries as necessary, 
yielded full-length cDNA sequences for ten novel NRY genes 
or families, and partial cDNA sequences for two additional 
ones (Table and Figures 1 - 14) . 



20 
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All 12 novel genes were localized on the Y chromosome, 
as described in Example 2. Figure 1 is a gene map of NRY. 
As shown, the Y chromosome consists of a large 
non-recombining region (NRY; euchromatin plus 
5 heterochromatin) flanked by pseudoautosomal regions (pter, 
short arm telomere; qter, long arm telomere) . The NRY is 
divided into 43 ordered intervals (1A1A through 7) which 
are defined by naturally occurring deletions (D. Vollrath, 
et al . , Science 258:52 (1992)). Listed immediately above 
10 the Y chromosome in Figure 1 are nine NRY genes with 

functional X homologs; novel genes are boxed. Indicated 
immediately below the Y chromosome are 11 testis-specif ic 
genes or families, some with multiple locations. It is 
likely that some testis-specif ic families have members in 
15 additional deletion intervals; the locations indicated are 
representative, but are not necessarily exhaustive. At the 
bottom of Figure 1 are shown NRY regions implicated, by 
deletion mapping, in sex determination, germ cell 
tumorigenesis (gonadoblastoma) , stature, and spermatogenic 
20 failure (K. Ma et al., Cell 75:1287 (1993); R. Reijo et 
al., Nat. Genet. 10:383 (1995); P. H. Vogt et al . , Hum. 
Mol. Genet. 5:933 (1996); J. L. Pryor et al . , New England 
J. Med. 336:534: (1997); K. Tsuchiya et al . , Am. J. Hum. 
Genet. 57:1400 (1995); P. Salo et al . , Hum. Genet. 55:283 
25 (1995)). Euchromatic regions that are made up, at least 

partially, of Y-specific repeats are drawn in grey. AMELY, 
which appears to fall within such a repeat -containing 
region, is actually located in a sub-region of 4A that is 
not repetitive. 
3 0 Expression of the 12 novel genes was assessed in 

diverse human tissues, by Northern blotting. 
Autoradiograms were produced by hybridizing 32 P- labeled 
cDNA probes to Northern blots of poly (A) + RNAs (2 £*g/lane) 
from human tissues (Clontech, Palo Alto, CA) . Probes 
3 5 employed were cDNA clones, full-length (most genes) or 
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partial (DBY, nucleotides 1476-2319 of GenBank AF000985; 
TPRY, nucleotides 861-1768 of GenBank AF000996; DFFRY, 
nucleotides 8604-9878 of GenBank AF000986) . Blots were 
hybridized at 65 °C in Church's buffer (0.5 M NaiP0 4 at 
5 pH7.5, with 7% SDS) , and washed at 65°C in IX SSC and 0.1% 
SDS. DBY, TB4Y, EIF1AY and DFFRY probes cross -hybridize to 
transcripts derived from their X homologs . For all five 
X-homologous genes (DBY, TPRY, TB4Y, EIF1AY and DFFRY) , 
expression was tested and confirmed in three male tissues 
10 (brain, prostate and testis) by RT-PCR using Y-specific 
primers . 

The novel genes encode an assortment of proteins and 
are dispersed throughout the euchromatic portions of the 
NRY. Nonetheless, all 12 genes fall into two discrete 

15 classes: 1) X-homologous genes and 2) testis-specif ic, 
Y-specific gene families (Table) . 

The X-homologous genes share the following 
characteristics : each has a homolog on the X chromosome 
encoding an extremely similar but nonidentical protein 

2 0 isoform, each is expressed in a wide range of human tissues 
(is not testis-specif ic) , and each appears to exist in a 
single copy on the NRY. There are five novel 
representatives of this X-homologous class: 

1. DBY encodes a novel "DEAD box" protein, perhaps an RNA 
25 helicase involved in translation initiation (P. Linder, et 
al., Nature, 337, 121 (1989); R.-Y. Chuang, P. L. Weaver, 
Z. Liu, T.-H. Chang, Science, 275, 1468 (1997)). The DBY 
protein is 91% identical to DBX, encoded by a homologous 
gene on the human X chromosome. 
30 2. TPRY encodes a novel protein containing 10 tandem "TPR" 
motifs, a protein-protein interaction domain found in the 
products of the yeast SSN6/CYC8, CDC16, and CDC23 genes, 
among others (R. S. Sikorski, M. S. Boguski, M. Goebl, P. 
Hieter, Cell, 60, 307 (1990); D. Tzamarias, K. Struhl, 
35 Genes Dev, 9, 821 (1995)). Differential splicing may 
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generate TPRY isoforms that differ at their carboxy 
termini. The amino terminal portion of the TPRY protein is 
83% identical to TPRX, encoded by an homologous gene on the 
X chromosome . 

3. TB4Y encodes a 44 amino acid protein that differs at 
only three residues from thymosin £ 4 , which functions in 
actin sequestration (H. Gondo, et al . , J. Immunol. 139:3840 
(1987) ; D. Safer, M. Elzinga, V. T. Nachmias, J Biol Chem, 
266, 4029 (1991)), and we found is located on the X. It is 
proposed that the X- linked gene encoding thymosin S 4 be 
called TB4X. 

4. EIF1AY encodes a Y-linked isoform of translation 
initiation factor 1A (elF-lA) (T. E. Dever, et al . , J Biol 
Chem, 269, 3212 (1994); J. W. Hershey, Annu. Rev. Biochem. 
60, 717 (1991) ) , which we discovered is located on the X. 
It is proposed that the X-linked gene encoding elF-lA be 
called EIF1AX. The amino acid sequences of the X and 
Y-encoded proteins are 97% identical. 

5. DFFRY encodes a Y-linked isoform of DFFRX, a recently 
described X-linked protein. A Y-linked homolog was 
detected previously, but had been thought to be a 
pseudogene. The human DFFRX and DFFRY proteins, which are 
91% identical, are homologous to the Drosophila fat- facets 
gene product, a deubiquinating enzyme required for eye 
development and oogenesis (M. H. Jones, et al., Hum Mol 
Genet 5, 1695 (1996); J. A. Fischer-Vize , G. M. Rubin, R. 
Lehmann, Development, 116, 985 (1992); Y. Huang, R. T. 
Baker, J. A. Fischer-Vize, Science, 270, 1828 (1995)). 

The second group of novel NRY genes, the testis- 
specific, Y-specific gene families, share a very different 
set of characteristics: each appears to be expressed 
specifically in testes and each appears to exist in 
multiple copies on the NRY, as judged by I) the number and 
intensity of hybridizing fragments on genomic Southern 
blots or ii) multiple map locations on the Y. We report 
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five novel testis-specif ic, Y-specific gene families with 
full-length cDNA sequences: 

1. The CDY family encodes proteins with an amino- terminal 
"chromodomain, " a chromatin binding motif (T. C. James, S. 
5 C. Elgin, Mol Cell Biol, 6, 3862 (1986); B. Tschiersch, et 
al., EMBO J, 13, 3822 (1994); R. Paro, D. S. Hogness, Proc 
Natl Acad Sci USA, 88, 263 (1991); D. G. Stokes, K. D. 
Tart of , R. P. Perry, Proc Natl Acad Sci USA, 93, 7137 
(1996); M. T. Madireddi, et al . , Cell, 87, 75 (1996)) 

10 (Figure 3). The carboxy- terminal half shows striking amino 
acid similarity, over a region of more than 2 00 residues, 
to nearly the full length of several enzymes, both 
prokaryotic and eukaryotic (M. Kanazawa, et al . , Enzyme 
Protein, 47, 9 (1993) ; A, Schmitz, K. H. Gartemann, J. 

15 Fiedler, E. Grund, R. Eichenlaub, Appl . Environ. Microbiol. 
258, 4068 (1992); Z. L. Boynton, G. N. Bennet, F. B. 
Rudolph, J Bacterid, 178, 3015 (1996) ; V. Sharma, K. 
Suvarna, R. Meganathan, M. E. Hudspeth, J Bacterid, 174, 
5057 (1992); P. M. Palosaari, et al . , J Biol Chem, 266, 

2 0 10750 (1991) ) . The reactions catalyzed by these homologs 

are diverse, but in each case the substrate contains 
cof actor A (CoA) attached to a carbonyl group, and an 
alkoxide intermediate is formed. The unprecedented 
combination of a chromodomain and a putative CoA- substrate 
25 enzyme in a single polypeptide suggests that, in vivo, CDY 
proteins may catalyze covalent modification of DNA or 
chromosomal proteins, perhaps during spermatogenesis. 
2. The BPY1 genes encode a basic protein, 125 residues 
long, with little sequence similarity to known proteins. 

3 0 The encoded protein is rich in serine, lysine, arginine, 

and proline and has a pi of 9.4. Southern blotting studies 
revealed homologous sequences on the human X chromosome, 
but screening of cDNA libraries has failed to yield 
X-derived clones. 
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3. The BPY2 genes encode a second basic protein, 106 
residues in length, without obvious sequence similarity to 
BPY1 or other known proteins- The pi of BPY2 is 10.0. 

4 . The XKRY genes encode a protein with sequence 
similarity to XK, a putative membrane transport protein 
defective in McLeod syndrome (M. Ho, et al . , Cell, 77, 869 

(1994) ) . 

5 . The PTPRY genes encode a protein with weak homology to 
a putative protein- tyrosine phosphatase (PTPase) in the 
mouse (W. Hendriks, et al • , J* Cell Biochem, 59, 418 

(1995) ). Two additional families of testis-specif ic 
transcription units, referred to as TTY1 and TTY2, have 
been identified. The sequences represented in Figures 14 
and 15 are being assessed for open reading frames. 

It appears that conventional single-copy genes, 
commonplace elsewhere in the genome, are quite uncommon in 
the NRY. Indeed, the two classes of NRY genes suggested by 
the systematic search described herein accommodate not only 
the 12 genes reported here, but also six of eight 
previously identified NRY genes. SRY, a Y-specific gene 
that triggers the male pathway of sexual differentiation, 
is expressed in testes, and exists in only one copy in the 
NRY. AMELY, which has an X- linked homolog AMELX, is 
expressed only in the developing tooth bud. The X 
inactivation status of AMELX is unknown. 

Also described herein are five additional genes and 
their sequences (Figures 15, 16, 17A - 17C) : human CDY 
Like (CDYL) , which is the human homolog of CDY; it is on 
chromosome 6p and expressed ubiquitously; mouse Cdyl (CDY 
like) , which is the mouse ortholog of human CDYL; it is on 
chromosome 13 and expressed predominantly in testis and 
also has a longer transcript that is expressed 
ubiquitously; and human VCP (Variably Charged Protein) 
family, which is a family of genes on the X chromosome that 
are homologous to BPYI, expressed in the testis and highly 
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polymorphic. Human CDY, human CDYL and mouse Cciyl have 
been shown to be histone acetyltransf erases by in vitro 
assays. Human CDY is a candidate for the Azoospermia 
Factor (AZF) because it is within the AZFc region that is 
commonly deleted in infertile men. Chemicals that block 
the enzymatic activity of any of these genes are candidate 
mal£ contraceptives. 

Inhibitors of the enzymatic activity of these genes, 
such as the human CDY gene, can be identified through an in 
vitro assay. For example, the protein encoded by one of 
the genes (e.g., CDY-encoded protein) can be produced, such 
as by recombinant means (e.g., in bacterial cells 
containing a vector or plasmid which includes the gene to 
be expressed) , and obtained. The effect of a candidate 
inibitor (drug) on the enzymatic activity of the protein 
can be assessed by combining the candidate inhibitor with 
the protein, a substrate of its enzymatic activity (e.g., 
histones) acetyl CoA (e.g., radiolabelled acetyl CoA) and 
other assay components (e.g., an appropriate physiological 
solution or buffer) , to produce a combination. The 
combination is maintained under conditions under which the 
enzymatic activity of the protein is maintained and 
appropriate for the protein to act upon/interact with its 
substrate (e.g., for the CDY gene to retain its histone 
acetyltransf erase activity) . As a result, the substrate is 
acted upon by the protein if the candidate inhibitor does 
not inhibit the protein and the protein acts upon the 
substrate. If the substrate is not acted upon by the 
protein, this is an indication that the candidate inhibitor 
is an inhibitor of the protein. For example, if a histone 
acetyltransf erase, such as CDY-encoded protein is inhibited 
by a candidate inhibitor, its histone acetyltransf erase 
activity will be blocked. If radiolabelled acetyl CoA is 
used, transfer of the radiolabelled acetyl group to the 
enzyme substrate (histones) is inhibited (will not occur or 
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will occur to a lesser extent than occurs in the absence of 
the candidate inhibitor) . Whether transfer occurs can be 
assessed by determining the location of radiolabelled 
acetyl groups from acetyl CoA. If the his tone substrates 
are not radiolabelled or are radiolabelled to a lesser 
extent in the presence of a candidate inhibitor (than in 
its absence) , the candidate inhibitor is an inhibitor of 
the protein- Inhibitors identified in this way can be 
further assessed in additional in vitro assays or in in 
vivo assays (e.g., in an appropriate animal model). 

To interpret the observation that these X-homologous 
and multi-copy, testis-specif ic groups account for 18 of 20 
known NRY genes or families, we postulate that the NRY's 
evolution was dominated by two strategies. The first 
strategy favors conservation of certain existing genes and 
the second favors the acquisition of a class of novel 
genes: 1) The X-homologous genes probably reflect the 
common ancestry of the X and Y chromosomes, and selective 
pressures to maintain comparable expression of genes in 
males and females. 2) The abundance of testis-specif ic 
families may have resulted from the NRY 1 s selectively 
retaining and amplifying genes that enhance male 
reproductive fitness . 

1) Dosage compensation and X-Y homology. Experts 
agree that the mammalian X and Y chromosomes evolved from 
autosomes, with nearly all ancestral gene functions 
deteriorating on the non-recombining portion of the 
emerging Y chromosome while being maintained on the nascent 
X chromosome (J. J. Bull, Evolution of Sex Determining 
Mechanisms (Benjamin Cummings, Menlo Park, CA, 1983); J. A. 
Graves, Annu. Rev. Genet. 30:233 (1996); B. Charlesworth, 
Curr. Biol. 6:149 (1996); W. R. Rice, Bioscience 46:331 
(1996)) . Functional degeneration of the NRY would result 
in females having two, but males only one, copy of many 
genes, creating the need for a mechanism to equalize 
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X- linked gene expression in the sexes. In mammals, a 
predominant solution to this problem is provided by X 
inactivation, the transcriptional silencing of one X 
chromosome in females. 
5 However, the findings on X-homologous NRY genes 

described herein, combined with previous studies, 
illustrate the importance in human evolution of an 
alternative solution: preservation of homologous genes on 
both the NRY and the X chromosome, with both male and 

10 female cells expressing two copies of such genes. A 

critical prediction of this model is that, in female cells, 
the X homologs should escape X inactivation . This is the 
case for all widely expressed X- linked genes with known NRY 
homologs, including the X homologs of five novel NRY genes 

15 reported here (E. M. Fisher, et al., Cell 63:1205 (1990); 
A. I. Agulniket al . , Hum. Mol. Genet. 3:879 (1994); M. H. 
Jones et al . , Hum. Mol . Genet. 5:1695 (1996); J. A. 
Fischer-Vize et al . , development 116:985 (1992); Y. Huang 
et al., Science 270:1828 (1995); A. Schneider-Gadicke et 

20 al., Cell 57:1247 (1989)). A second prediction of this 

model is that the human X and Y encoded proteins should be 
functionally interchangeable even though the nucleotide 
sequences of their corresponding genes are considerably 
diverged. Indeed, each of the eight known X-NRY gene pairs 

25 encode closely related isoforms, with 83 to 97% amino acid 
identity throughout their lengths; functional 
interchangeability has been demonstrated in the one case 
tested to date (M. Watanabe et al . , Nat. Genet. 4:268 
(1993) ) . 

30 Turner syndrome is classically associated with an XO 

sex chromosome constitution. In 1965, Ferguson- Smith 
postulated that the Turner phenotype might be due to 
inadequate expression of X-Y common genes that escape X 
inactivation (M. A. Ferguson-Smith, J. Med. Genet. 2:142 

35 (1965) ) . These "Turner genes" have yet to be identified 
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with certainty. However, there now exists a substantial 
collection of X-homologous NRY genes (Figure 1) which can 
be assessed for genes which contribute to or are 
responsible for the Turner phenotype. The potential role 
5 of RPS4Y and RPS4X in Turner syndrome is controversial (E. 
M. Fisher et al . , Cell 63:1205 (1990); W. Just et al . , Hum. 
Genet. 89:240 (1992)). At least one Turner gene maps to 
the Xp-Yp pseudoautosomal region (T. Ogata et al., J. Med. 
Genet. 30:918 (1993)) . Seven of the eight known X-NRY gene 

10 pairs appear to be ubiquitously expressed, and at least 

three encode housekeeping proteins: an essential ribosomal 
protein (RPS4), an essential translation initiation factor 
(elF-lA) , and a modulator of act in polymerization (thymosin 
fi4) . Perhaps some features of the XO phenotype (e.g., poor 

15 fetal viability) reflect inadequate expression of such 
housekeeping functions . 

2) Male fitness and Y-specific, testis-specif ic 
genes. As first appreciated by R.A. Fisher, animal genomes 
may contain genes or alleles that enhance male reproductive 

20 fitness but are inconsequential or detrimental with respect 
to female fitness (R. A. Fisher, Biol. Rev. 6:345 (1931)). 
As Fisher recognized, selective pressures would tend to 
favor the accumulation of such genes in male- specif ic 
regions of genomes. Of course, male reproductive fitness 

25 depends critically on sperm production, the central task of 
the adult testis. Since the NRY is the only male-specific 
portion of the mammalian genome, it should have a unique 
tendency to accumulate male -benefit genes during evolution. 
These principles are illustrated by several gene 

30 families on the human NRY. De novo deletions of the DAZ 

gene cluster on the human Y chromosome are associated with 
severe spermatogenic defects (R. Reijo et al. , Nat. Genet. 
20.-383 (1995)), and in Drosophila the DAZ homolog boule is 
required for spermatogenesis (C. G. Eberhart et al., Nature 
35 381:183 (1996)). The DAZ gene cluster on the human Y 
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chromosome arose, during primate evolution, by 
transposition and amplification of an autosomal gene. 
Likewise, two other testis -specific NRY gene families 
— YRRM and TSPY — may also be the result of the Y 
5 chromosome's having acquired and amplified autosomal genes 
(R. Saxena et al . , Nat. Genet. 14:292 (1996); M. L. 
Delbridge et al., Nat. Genet. 15:131 (1997)). It is 
possible that the selective advantage conferred by the 
NRY's retaining and amplifying male fertility factors (from 

10 throughout the genome) accounts for the multitude of 

testis-specif ic gene families there. This may have been 
the preeminent force in shaping the NRY 1 s gene repertoire, 
as it appears that the great majority of NRY transcription 
units are members of such testis -specific families. In the 

15 NRY, each of the testis-specif ic gene families has multiple 
members, 2 0 to 4 0 copies in the case of TSPY (E. Manz et 
al., Genomics 17: 726 (1993)), and perhaps as many as 20 
copies in the case of YRRM (K. Ma et al., Cell 75:1287 
(1993)). All together, the various Y-specific gene 

20 families may include as many as several hundred genes or 
copies. Though it is not known how many of these are 
functional, it seems likely that Y-specific, 
testis-specif ic gene families comprise the great majority 
of NRY transcription units. 

25 Recent genetic studies underscore the importance of 

the human Y chromosome in fertility. Many men with 
spermatogenic failure, but who are otherwise healthy, have 
deletions of portions of the NRY (K. Ma et al . , Cell 75: 
1287 (1993); R. Reijo et al . , Nat. Genet. 10:383 (1995); P. 

30 H. Vogt et al., Hum. Mol . Genet. 5:933 (1996); J. L. Pryor 
et al., New England J. Med. 336:534: (1997)). These 
findings suggested the existence of NRY genes that play 
critical roles in male germ cell development but are not 
required elsewhere in the body. Previous deletion mapping 

35 studies have implicated four regions of the NRY in either 
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spermatogenic failure or germ cell tumorigenesis , and in 
each of these four regions we now report novel candidate 
genes expressed specifically, or most abundantly, in testes 

(Figure 1) . As shown in Figure 1, the region implicated in 
gonadoblastoma , stature and spermatogenic failure all 
contain novel candidate genes. Two of the three regions 
implicated in spermatogenic failure each contain one or 
more novel testis-specif ic genes. The third region 
implicated in spermatogenic failure (intervals 5B-5D) 
contains two X-homologous genes, DBY and EIF1AY, with 
abundant, testis-specif ic transcripts in addition to 
higher-molecular-weight , ubiquitous transcripts . 

While X-homologous and testis-specif ic genes are 
somewhat intermingled within the NRY, clustering is evident 

(Figure 1) . The geographic distribution of the two classes 
correlates quite well with previously identified sequence 
domains within the euchromatic NRY (D. Vollrath et al., 
Science 258:52 (1992); S. Foote et al . , Science 258:60 

(1992)). Ten of the 11 known testis-specif ic families map 
to previously identified regions of Y-specific repetitive 
sequences. The only exception is BPY1, which 
cross -hybridizes to the X chromosome and maps to a 
previously recognized region of X homology. Indeed, one or 
more testis-specif ic gene families are found in nearly all 
known regions of euchromatic Y repeats (Figure 1) . 
Ironically, it had been widely assumed that these regions 
consisted of "junk" DNA, partly on theoretical grounds (B. 
Charlesworth, Science 251:1030 (1991); E. Seboun et al . , 
Cold Spring Harb. Symp. Quant. Biol. 1:237 (1986)). To the 
contrary, the results presented here argue that these 
Y-specific repetitive regions contain the great majority of 
the NRY 1 s transcription units (The only exception is BPY1, 
which cross -hybridizes to the X chromosome and maps to a 
previously recognized region of X homology) . These regions 
may be the result of rampant gene amplification during 
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mammalian evolution. By contrast, none of the eight 
X-homologous genes map to the Y-repeat regions; all eight 
map to regions previously identified as consisting largely 
of single-copy (or in some cases X-homologous) sequences, 
5 It is possible that, early in mammalian evolution, these 
regions of the NRY shared extensive sequence identity with 
the nascent X chromosome. The stage is now set for 
systematic evolutionary, biochemical and cell biological 
studies of the NRY, an idiosyncratic segment of the human 
10 genome. 

The present invention relates to isolated DNA and 
genes, present on (which occur on) the Y chromosome, whose 
sequences are provided herein, as well as characteristic 
portions of the DNA. It relates to additional nucleic 
15 acid/nucleotide sequences which are not identical to the 
sequences presented herein but include substitutions or 
differences; DNA which includes substitutions or 
differences and encodes the same amino acid sequence as a 
DNA whose sequence is provided herein or includes 
20 substitutions which do not alter the ability of a DNA probe 
or primer which hybridizes to DNA whose sequence is 
presented herein to hybridize to the DNA containing the 
substitutions or differences. It further relates to DNA 
which encodes a protein or peptide whose sequence is 
25 presented herein. The present invention also includes the 
complements of the DNA sequences presented herein, DNA 
which hybridizes under stringent (high stringency) 
conditions to the DNA whose sequences are presented and to 
RNA transcripts . The invention further relates to encoded 
30 proteins, peptides and other products (e.g., glycoproteins) 
and antibodies which are raised against or bind to proteins 
or peptides whose amino acid sequences are presented herein 
or are encoded by DNA whose sequences are provided. As 
used herein, the term isolated DNA which occurs on the non- 
35 recombining region of the human Y chromosome refers to DNA 
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which has been obtained or removed from the human Y 
chromosome or DNA, produced by any means (e.g., recombinant 
techniques, synthetic methods) , which has the sequence of 
such Y chromosome DNA. For example, isolated testis - 
specific DNA or isolated testis-specif ic DNA which occurs 
on the non-recombining region of the human Y chromosome is 
DNA which has been obtained or removed from the non- 
recombining region of the human Y chromosome or which has 
the sequence of such DNA and has been obtained or produced 
by any means . 

Thus, this invention has application to several areas. 
It may be used diagnostically to identify males with 
reduced sperm count in whom a gene has been deleted or 
altered. It may also be used therapeutically in gene 
therapy treatments to remedy fertility disorders associated 
with deletion or alteration of a gene described. In one 
embodiment of a gene therapy method, a gene described 
herein, or a gene portion which encodes a functional 
protein, is introduced into a man whose sperm count is 
reduced and in whom the gene is expressed and the encoded 
protein replaces the protein normally produced or enhances 
the quantity produced. The present invention may also be 
useful in designing or identifying agents which function as 
a male contraceptive by inducing reduced sperm count. This 
invention also has application as a research tool, as the 
nucleotide sequences described herein have been localized 
to regions of the Y chromosome. 

The present invention includes nucleotide sequences 
described herein, and their complements, which are useful 
as hybridization probes or primers for an amplification 
method, such as polymerase chain reaction (PCR) , to show 
the presence, absence or disruption of the gene of the 
present invention. Probes and primers can have all or a 
portion of the nucleotide sequence (nucleic acid sequence) 
of a gene described herein or all or a portion of its 
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complement. For example, sequences shown in the Figures or 
Example 2 (SEQ ID NOS.; 1-84), as well as the complements 
thereof, can be used. The probes and primers can be any 
length, provided that they are of sufficient length and 
5 appropriate composition (appropriate nucleotide sequence) 
to hybridize to all or an identifying or characteristic 
portion of the gene described or to a disrupted form of the 
gene, and remain hybridized under the conditions use. 
Useful probes include, but are not limited to, nucleotide 
10 sequences which distinguish between a gene described herein 
and an altered form of that gene shown to be associated 
with reduced sperm count (azoospermia, oligospermia) . 
Generally, the probe will be at least 7 nucleotides, while 
the upper limit is the length of the gene itself, e.g., up 
15 to about 40,000 nucleotides in length. Probes can be, for 
example, 10 to 14 nucleotides or longer (e.g., 20, 30, 50, 
10 0, 250 nucleotides or any other useful length) ; the 
length of a specific probe will be determined by the assay 
in which it is used. 
20 !n one embodiment, the present invention is a method 

of diagnosing or aiding in the diagnosis of reduced sperm 
count associated with deletion or alteration of a gene 
described herein. Any man may be assessed with this method 
of diagnosis. In general, the man will have been at least 
25 preliminarily assessed, by another method, as having a 
reduced sperm count. By combining nucleic acid probes 
derived either from the isolated native sequence or cDNA 
sequence of the gene, or from appropriate primers, with the 
DNA from a sample to be assessed, under conditions suitable 
3 0 for hybridization of the probes with unaltered 

complementary nucleotide sequences in the sample but not 
with altered complementary nucleotide sequences, it can be 
determined whether the man possesses the intact gene. If 
the gene is unaltered, it may be concluded that the 
3 5 alteration of the gene is not responsible for the reduced 
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sperm count. This invention may also be used in a similar 
method wherein the hybridization conditions are such that 
the probes will hybridize only with altered DNA and not 
with unaltered sequences. The hybridized DNA can also be 
5 isolated and sequenced to determine the precise nature of 
the alteration associated with the reduced sperm count. 
DNA assessed by the present method can be obtained from a 
variety of tissues and body fluids, such as blood or semen. 
In one embodiment, the above methods are carried out on DNA 

10 obtained from a blood sample. 

The invention also provides expression vectors 
containing a nucleotide (nucleic acid) sequence described 
herein, which is operably linked to at least one regulatory 
sequence. "Operably linked" is intended to mean that the 

15 nucleotide sequence is linked to a regulatory sequence in a 
manner which allows expression of the nucleotide sequence. 
The term "regulatory sequence" included promoters, 
enhancers, and other expression control elements (see, 
e.g., Goeddel, n P nP Rynrp ^inn Technology; Methods in 

20 Rn7vmoloav 185. Academic Press, San Diego, CA (1990)). It 
should be understood that the design of the expression 
vector may depend on such factors as the choice of the host 
cell to be transformed and/or the protein or peptide 
desired to be expressed. For instance, the peptides of the 

25 present invention can be produced by ligating the cloned 
gene, or a portion thereof, into a vector suitable for 
expression in either prokaryotic cells, eukaryotic cells or 
both (see, for example, Broach, et al., flxperi mental 
Manimilatlon Ftene ExnrftBBion, ed. M, Inouye (Academic 

30 Press, 1983) p. 83; M"1^ ffll1ar Cloning; ft Laboratory 

Manual, 2nd Ed., ed. Sambrook et al. (Cold Spring Harbor 
Laboratory Press, 1989) Chapters 16 and 17) . 

Prokaryotic and eukaryotic host cells transfected by 
the described vectors are also provided by this invention. 
35 For instance, cells which can be transfected with the 
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vectors of the present invention include, but are not 
limited to, bacterial cells such as E. coli, insect cells 
(baculovirus) , yeast and mammalian cells, such as Chinese 
hamster ovary cells (CHO) . 
5 Thus, a nucleotide sequence described herein can be 

used to produce a recombinant form of the protein via 
microbial or eukaryotic cellular processes . Production of 
a recombinant form of the protein can be carried out using 
known techniques, such as by ligating the oligonucleotide 
10 sequence into a DNA or RNA construct, such as an expression 
vector, and transforming or transfecting the construct into 
host cells, either eukaryotic (yeast, avian, insect or 
mammalian) or prokaryotic (bacterial cells) . Similar 
procedures, or modifications thereof, can be employed to 
15 prepare recombinant proteins according to the present 

invention by microbial means or tissue- culture technology. 

The present invention also pertains to pharmaceutical 
compositions comprising the proteins and peptides described 
herein. For instance, the peptides or proteins of the 
2 0 present invention can be formulated with a physiologically 
acceptable medium to prepare a pharmaceutical composition. 
The particular physiological medium may include, but is not 
limited to, water, buffered saline, polyols (e.g., 
glycerol, propylene glycol, liquid polethylene glycol) and 
5 dextrose solutions. The optimum concentration of the 

active ingredient (s) in the chosen medium can be determined 
empirically, according to procedures well known to 
medicinal chemists, and will depend on the ultimate 
pharmaceutical formulation desired. Methods of 
0 introduction of exogenous polypeptides at the site of 
treatment include, but are not limited to, intradermal, 
intramuscular, intraperitoneal , intravenous , subcutaneous , 
oral and intranasal. Other suitable methods of 
\troduction can also include rechargeable or biodegradable 
\ces and slow release polymeric devices . The 
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pharmaceutical compositions of this invention can also be 
administered as part of a combinatorial therapy with other 
agents . 

This invention also has utility in methods of treating 
5 disorders of reduced sperm count associated with deletion 
or alteration of a gene described herein. These genes may 
be used in a method of gene therapy, whereby the gene or a 
gene portion encoding a functional protein is inserted into 
cells in which the functional protein is expressed and from 
10 which it is generally secreted to remedy the deficiency 
caused by the defect in the native gene. 

The present invention is also related to antibodies 
which bind a protein or peptide encoded by all or a portion 
of a gene of the present invention, as well as antibodies 
15 which bind the protein or peptide encoded by all or a 
portion of a disrupted form of the gene. For instance, 
polyclonal and monoclonal antibodies which bind to the 
described polypeptide or protein are within the scope of 
the invention. A mammal, such as a mouse, hamster or 
20 rabbit, can be immunized with an immunogenic form of the 

protein or peptide (an antigenic fragment of the protein or 
peptide which is capable of eliciting an antibody 
response) . Techniques for conferring immunogenicity on a 
protein or peptide include conjugation to carriers or other 
25 techniques are well known in the art. The protein or 

peptide can be administered in the presence of an adjuvant. 
The progress of immunization can be monitored by detection 
of antibody titers in plasma or serum. Standard ELISA or 
other immunoassays can be used with the immunogen as 
3 0 antigen to assess the levels of antibody. 

Following immunization, anti-peptide antisera can be 
obtained, and if desired, polyclonal antibodies can be 
isolated from the serum. Monoclonal antibodies can be 
isolated from the serum. Monoclonal antibodies can also be 
3 5 produced by standard techniques which are well known in the 
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art (Koehler and Milstein, Nature 256 . 495-497 (19775); 
Kozbar et al . , Immunology Today £ : 72 (1983); and Cole et 
a1 -' Monoclonal Antibodies and Cannpr Therapy . Alan R. 
Liss, Inc., pp. 77-96 (1985)). Such antibodies are useful 
as diagnostics for the intact or disrupted gene and also as 
research tools for identifying either the intact or 
disrupted gene. 

The present invention is illustrated by the following 
examples, which are not intended to be limiting in any way. 

EXAMPLE 1 ISOLATION OF CDNA CLONES FROM HUMAN TESTIS 



"cDNA selection 1 ' (M. Lovett et al. t Proc. Natl. Acad. 
Sci. USA 88:9628 (1991)) was carried out using bulk cDNA 
prepared from human adult testes (Clontech, Palo Alto, CA) 
and, as selector, a cosmid library prepared from 
flow- sorted Y chromosomes (Lawrence Livermore National 
Laboratory: LL0YNC03) . A total of 3 600 random cosmids, 
providing nearly five -fold coverage of the 3 0 -Mb 
euchromatic region, were used to generate 150 pools of 
selector DNA. Using each of the 15 0 selector pools, we 
carried out four successive rounds of cDNA selection, 
followed by two rounds of subtraction with human COT-1 DNA 
(Gibco BRL, Gaithersburg, MD) to remove highly repetitive 
sequences. A plasmid library was prepared from each of the 
150 resulting pools of selected cDNA fragments, and 24 
clones from each library were sequenced from one end. Of 
the 3600 sequences generated, about 600 were of poor 
technical quality and about 500 were found to derive from 
cloning vector or E. coli host, leaving 253 9 sequences for 
further analysis. Of the 2539 sequence fragments, 536 
corresponded to previously reported NRY genes (487 to TSPY, 
15 to YRRM, 14 to RPS4Y, 9 to SMCY, 5 to DAZ, 3 to SRY, 3 
to ZFY) and 41 corresponded to previously reported 
pseudoautosomal genes (15 to XE7, 11 to CSF2RA, 4 to IL3RA, 
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3 to ASMT, 3 to IL9R, 2 to ANT 3 , 2 to MIC2 , 1 to SYBL1) . 
Electronic analysis of the roughly 2 00 0 remaining sequences 
revealed that about 200 contained known repetitive 
elements, and these were not pursued. By electronically 
identifying redundancies and sequence overlaps, the 
remaining sequences were reduced to 1093 sequence contigs. 
Sequences representing these 1093 contigs were individually 
hybridized to dot -blotted yeast genomic DNAs of 60 YACs 
comprising most of the Y's euchromatic region (S. Foote et 
al., Science 258:60 (1992)). 181 sequences that hybridized 
to the great majority of the YACs were judged likely to 
contain highly repeated elements and were not pursued, 
leaving 912 sequences for further analysis. The 912 
sequences were individually hybridized to Southern blots of 
Rl-digested human 46, XX female and 49,XYYYY male (L. Sirota 
et al., Clin. Genet. 19:87 (1981)) genomic DNAs. Blots 
were hybridized at 65°C in Church 1 s buffer (0.5 M Na<P0 4 at 
pH7.5, with 7% SDS) , and washed at 65°C in IX SSC and 0.1% 
SDS, with 832 hybridizations yielding interpretable 
results . Many sequences appeared to contain highly 
repeated elements common to males and females, or failed to 
detect an unambiguously Y-specific restriction fragment, 
and these were not pursued. By contrast, 3 08 sequences 
hybridized to at least one prominent fragment present in 

4 9,XYYYY but absent in 4 6, XX, suggesting that these 
sequences derived from the NRY . Each of these 308 
sequences was individually used to screen, by 
hybridization, about 2 million plaques from a 1 phage 
library of human adult testis cDNA (Clontech, Palo Alto, 
CA) . 

EXAMPLE 2 LOCALIZATION OF 12 NOVEL GENES ON THE Y 
CHROMOSOME 

Genes were localized on a previously reported NRY 
deletion map by testing with PCR for their presence or 
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absence in individuals carrying partial Y chromosomes (D. 
Vollrath et al . , Science 258:52 (1992)}. Most genes were 
localized to a single deletion interval. Some genes could 
not be unambiguously placed because copies exist in 
multiple locations in the NRY. In such cases, genes were 
localized by PCR testing of YACs encompassing the NRY's 
euchromatic region (S. Foote et al . , Science 258:60 
(1992)) . X homologs of Y genes were mapped onto the X by 
PCR testing a panel of human/rodent somatic hybrid cell 
lines (Research Genetics, Huntsville, AL) . All PCR assays 
consists of 3 0 cycles of the following conditions: 1 min 
denaturing at 94 °C, 45 sec annealing at 60°C, and 45 sec 
extension at 72 °C. TB4X primers were designed from an 
unreported intron. TPRX primers were designed from 
unreported cDNA sequence. All other primers were designed 
from cDNA sequences as submitted to Genbank. PCR primers 
were as follows: 



TPRY 



TB4Y 



GENE LEFT PRIMER 
DBY CATTCGGTTTTACCAGCCAG 
20 (SEQ ID NO. : 51) 

GCATCATAATATGGATCTAGTAGG 
(SEQ ID NO. : 53) 
CAAAGACCTGCTGACAATGG 
(SEQ ID NO. : 55) 
25 EIF1AY CTCTGTAGCCAGCCTCTTC 
(SEQ ID NO. : 570 
DFFRY GAGCCCATCTTTGTCAGTTTAC 

(SEQ ID NO. : 59) 
CDY GGCTCAAAATCCACTGACG 
30 (SEQ ID NO. : 61) 

BPY1 CTCCCTGAGCAGCAACTAAG 

(SEQ ID NO. : 63) 
BPY2 CCAGGACCATGTGATATGG 
(SEQ ID NO. : 65) 



RIGHT PRIMER 
CAGTGACTCGAGGTTCAATG 
(SEQ ID NO. : 52) 
GGAGATACTGAATAGCATAGC 
(SEQ ID NO. : 54) 
CTCCGCTAAGTCTTTCACC 
(SEQ ID NO. : 56) 
GACTCCTTTCTGGCGGTTAC 
(SEQ ID NO. : 58) 
CTGCCAATTTTCCACATCAACC 
(SEQ ID NO. : 60) 
CAAGCGATATCTCACCACC 
(SEQ ID NO. : 62) 
GTCATCAACATGGGAAGCAC 
(SEQ ID NO. : 64) 
CTAATTCCCTCTTTACGCATGACC 
(SEQ ID NO. : 66) 
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5 TTY1 
TTY2 
DBX 
TPRX 
TB4X 
15 EIF1AX 
DFFRX 
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CACTCATGGAGAAGGGTAGG 

(SEQ ID NO. : 67) 
GAG C ACAC CACAC C AG AAA C 

(SEQ ID NO. : 69) 
CTCTGGGAATCAAATTCGAGG 

(SEQ ID NO. : 71) 
GACAACTCTGACAGCCAGG 

(SEQ ID NO. : 73) 
CTACATGCAGATGACATGGTG 

(SEQ ID NO. : 75) 
CATGTTCCCTGTAGCACATC 

(SEQ ID NO. : 77) 
CCCGCCCTTTCATCATCC 

(SEQ ID NO. : 79) 
CACGAGGCGCCATTTGCTG 

(SEQ ID NO. : 81) 
CCTCCACCTGAAGATGCC 

(SEQ ID NO. : 83) 



PCT/US98/07115 



GTCACACTCAGCCTCTTTAC 

(SEQ ID NO. : 68) 
CTCAGACTGACCTCGGACTG 

(SEQ ID NO. : 70) 
GTCTTTCAGCCAATCCAAGG 

(SEQ ID NO. : 72) 
GTCAGAACTCCCAAACAGG 

(SEQ ID NO. : 74) 
GGCCAAGGTGCATAGGTG 

(SEQ ID NO. : 76) 
CGTTTCCATTACTTCCATTTCCTG 

(SEQ ID NO. : 78) 
GCTCCCCAAAGTAGCCTTC 

(SEQ ID NO. : 80) 
CTGGAGGCCAGGCAACGTG 

(SEQ ID NO. : 82) 
CTGAGATCCAGGTGAATGG 

(SEQ ID NO. : 84) 



EQUIVALENTS 

Those skilled in the art will recognize, or be able 
to ascertain using no more than routine experimentation, 
many equivalents to the specific embodiments of the 
invention described herein. Such equivalents are intended 
to be encompassed by the following claims. 
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CLAIMS 



We claim: 



Isolated testis -specific DNA which occurs on the non- 
recombining region of the human Y chromosome or the 
complement thereof . 

The isolated testis -specific DNA of Claim 1 which 
occurs in multiple copies on the non-recombining region 
of the human Y chromosome or the complement thereof . 



10 



15 



20 



25 



30 



The isolated testis-specif ic DNA of Claim 2 selected 
from the group consisting of: 



(a 
(b 
(c 
(d 
(e 
(f 

(g 

(h 

(i 

(j 
(k 

(1 
(m 
(n 
(o 

(P 
<q 
(r 



a CDY gene or a characteristic portion thereof; 
a BPY 1 gene or a characteristic portion thereof; 
a BPY 2 gene or a characteristic portion thereof; 
an XKRY gene or a characteristic portion thereof; 
a PTPRY gene or a characteristic portion thereof; 
TTY1 DNA; or a characteristic portion thereof; 
TTY 2 DNA; or a characteristic portion thereof; 
a complement of (a) 
a complement of (b) 
a complement of (c) 
a complement of (d) 
a complement of (e) 
a complement of (f) 
a complement of (g) 

DNA encoding the amino acid sequence of SEQ ID 
No . : 3 9;. 

DNA encoding the amino acid sequence of SEQ ID 
No. : 40; 

DNA encoding the amino acid sequence of SEQ ID 
No.: 42; 

DNA encoding the amino acid sequence of SEQ ID 
No. : 44; 
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(s) DNA encoding the amino acid sequence of SEQ ID 
No. : 46; 

(t) DNA encoding the amino acid sequence of SEQ ID 
No . : 4 8; and 

5 (u) DNA which hybridizes to a DNA of any one of (a) 

through (t) under stringent conditions. 

4. Isolated testis specific DNA selected from the group 
consisting of: 
(a) DNA of SEQ ID No.: 37 
10 (b) DNA of SEQ ID No.: 38, 

(c) DNA of SEQ ID No . : 41 

(d) DNA of SEQ ID No.: 43 

(e) DNA of SEQ ID No.: 45 

(f) DNA of SEQ ID No.: 47, 
15 (g) DNA of SEQ ID No . : 49 

(h) DNA of SEQ ID No.: 50 

(i) DNA encoding the amino acid sequence of SEQ ID 
No . 3 9 ; 

(j) DNA encoding the amino acid sequence of SEQ ID 
20 No. 40; 

(k) DNA encoding the amino acid sequence of SEQ ID 
No. 42; 

(1) DNA encoding the amino acid sequence of SEQ ID 
No. 44; 

25 (m) DNA encoding the amino acid sequence of SEQ ID 

No. 46; 

(n) DNA encoding the amino acid sequence of SEQ ID 
No. 48; 

(o) a complement of a DNA of any one of (a) through 
3 0 (n) ; and 

(p) DNA which hybridizes to a DNA of any one of (a) 
through (o) under stringent conditions. 
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5. Isolated X-homologous DNA which occurs on the non- 
recombining region of the human Y chromosome, is not 
testis -specific and has a homolog on the human X 
chromosome . 



6. The isolated DNA of Claim 5 selected from the group 
consisting of: 

(a a DBY gene or a characteristic portion thereof; 
(b) a TPRY gene or a characteristic portion thereof; 
10 (c) a TB4Y gene or a characteristic portion thereof; 

(d) an EIF1AY gene or a characteristic portion 
thereof; 

(e) a DFFRY gene or a characteristic portion 
thereof ; 

15 (f ) a complement of (a) ; 

(g) a complement of (b) ; 

(h) a complement of (c) ; 

(i) a complement of (d) ; 
(j) a complement of (e) ; 

20 (k) a complement of (f ) ; 

(1) DNA encoding the amino acid sequence of SEQ ID 
No. : 18; 

(m) DNA encoding the. amino acid sequence of SEQ ID 
No. : 22 ; 

25 (n) DNA encoding the amino acid sequence of SEQ ID 

No.: 23 

(o) DNA encoding the amino acid sequence of SEQ ID 
No.: 24; 

(p) DNA encoding the amino acid sequence of SEQ ID 
30 No. : 28; 

(q) DNA encoding the amino acid sequence of SEQ ID 
No . : 32; 

(r) DNA encoding the amino acid sequence of SEQ ID 
No. : 36; and; 
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(s) DNA which hybridizes to a DNA of any one of (a) 
through (r) under stringent conditions. 



10 



15 



20 



25 



30 



Isolated X-homologous human DNA selected from the group 
consisting of: 

(a) DNA of SEQ ID No. : 17 or a characteristic portion 
thereof ; 

(b) DNA of SEQ ID No.: 19 or a characteristic 
portion thereof; 

(c) DNA of SEQ ID No. : 2 0 or a characteristic 
portion thereof; 

(d) DNA of SEQ ID No.: 21 or a characteristic 
portion thereof; 

(e) DNA of SEQ ID No. : 26 or a characteristic 
portion thereof; 

(f ) DNA of SEQ ID No. : 3 0 or a characteristic 
portion thereof; 

(g) DNA of SEQ ID No. : 34 or a characteristic 
portion thereof; 

(h) DNA encoding the amino acid sequence of SEQ ID 
No. : 18; 

(i) DNA encoding the amino acid sequence of SEQ ID 
No. : 22; 

(j) DNA encoding the amino acid sequence of SEQ ID 
No. : 23; 

(k) DNA encoding the amino acid sequence of SEQ ID 
No.: 24; 

(1) DNA encoding the amino acid sequence of SEQ ID 
No. : 28; 

(m) DNA encoding the amino acid sequence of SEQ ID 
No. : 32; 

(n) DNA encoding the amino acid sequence of SEQ ID 
No . : 3 6; 

(o) a complement of a DNA of any one of (a) through 
(n) ; and 
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(p) DNA which hybridizes to a DNA any one of (a) 
through (o) under stringent conditions. 

8. A DNA probe comprising all or a characteristic portion 
of DNA of Claim 4. 

5 9. A DNA probe comprising all or a characteristic portion 
of DNA of Claim 7. 
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idem for PTPRY, SEQ ID N0:47,48 

6. Claims: 1-4,8 partially 

idem for TTY 1, SEQ ID NO: 49 

7. Claims: 1-4,3 partially 

idem for TTY 2, SEQ ID NO: 50 



8. Claims: 5-7,9 partially 

Isolated DNA which occurs on the non-recombining region of 
the human Y chromosome or the complement thereof, not being 

testi s-speci fi c and having a homolog on the human X 
chromosome. 

Said DNA being the DBY gene; a characteristic portion, a 

probe or complement thereof, a 'DNA which hybridizes thereto 

under stringent conditions. ; 
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Said DNA having the SEQ ID NO: 17 and coding for the amino 
acid of SEQ ID NO: 18. 



9. Claims: 5-7,9 partially 

idem for TPRY, SEQ ID NO: 19,20,21,22,23,24 

10. Claims: 5-7,9 partially 

idem for TB4Y, SEQ ID N0:26,28 

11. Claims: 5-7,9 partially 

idem for EIF1AY, SEQ ID N0:30,32 

12. Claims: 5-7,9 partially 

idem for DFFRY, SEQ ID N0:34,36 
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